You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by tq...@apache.org on 2019/11/26 05:31:29 UTC

[incubator-tvm-test] branch asf-site updated (707b965 -> a98e52d)

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-test.git.


    omit 707b965  temp remove checks
    omit fbf8b84  Update vta.md
    omit dcb25bc  Jenkins (#45)
    omit c032023  Update 2019-05-30-pytorch-frontend.md
    omit 01064e8  Update 2019-05-30-pytorch-frontend.md
    omit 0118472  Add writeup on PyTorch frontend (#44)
    omit f981763  remove html suffix in blog (#43)
    omit 4f9511a  allow blog to not end with html (#42)
    omit 75a4960  Fix typo in 2019-04-30-opt-cuda-quantized.md (#41)
    omit 95d61a0  [Post] Optimizing quantized operators on CUDA (#40)
    omit b51385d  Images for the TVM developer docs on the InferBound pass. (#39)
    omit f6cc543  Update 2019-03-18-tvm-apache-announcement.md
    omit 22fd734  Update 2019-03-18-tvm-apache-announcement.md
    omit 8413f7c  Update tweet feed (#38)
    omit 3e6287f  [POST] Apache ANN (#37)
    omit b523602  	* [Golang] images rework (#36)
    omit 07ce486  Update 2019-01-19-Golang.md
    omit 688e76a  Update and rename 2019-01-08-Golang.md to 2019-01-19-Golang.md
    omit b22df17  [BLOG] Golang blog (#35)
    omit e7de55b  Update 2018-12-18-lowprecision-conv.md
    omit 66d1605  Adding captions  (#34)
    omit 038916e  Update 2018-12-18-lowprecision-conv.md
    omit 90e8eff  Update 2018-12-18-lowprecision-conv.md
    omit 474f6bd  Update 2018-12-18-lowprecision-conv.md
    omit a89e653  Change date (#32)
    omit ea0a2f6  Low precision blogpost (#31)
    omit 7c26cc4  add eps logo (#30)
    omit 22bed38  add conf (#29)
    omit 0f9403e  add conference (#28)
    omit 77497b1  add file to relay (#27)
    omit 9d5ca73  Change location (#26)
    omit 2818c7f  add relay figure (#25)
    omit ecd6831  [ml-in-tees] Fix DP image link (#24)
    omit a27ea14  ok
    omit 4b88bd0  minor
    omit 450f124  ok
    omit efcab16  fix typo
    omit ccdf698  update
    omit a7ad4a4  update
    omit e074fcf  add ann list
    omit 113433f  Add post about TVM in TEEs (#23)
    omit a14d55d  fix typo (#22)
    omit c3ada80   Add blog about auto tuning for all hardware platforms  (#21)
    omit bb90303  add to sampl
    omit 2f2a001  add link to C API for dlpack (#20)
    omit 79bb186  fix some typos in blog post (#19)
    omit e7d8339  Update 2018-08-10-DLPack-Bridge.md
    omit 9ecd299  Update 2018-08-10-DLPack-Bridge.md
    omit 4675f78  Rename 2018-08-10-PyTorch-DLPack.md to 2018-08-10-DLPack-Bridge.md
    omit a1e59a7  DLPackBridge blog post (#18)
    omit a40fb6b  added link to blog post (#17)
    omit 853e2e6  VTA announcement + blogpost (#16)
    omit 72e8c3f  add twitter (#15)
    omit 1f4e5d4  update about (#14)
    omit 2a5fb51  Switch links to https (#13)
    omit 50a922d  Switch links to https (#12)
    omit df74c41  Update about.md
    omit 8f1b177  font size tweak (#11)
    omit b4c24b1  update the homepage (#10)
    omit de6c7c8  Update CNAME
    omit bfd2bc4  Update CNAME
    omit 2f44ab2  Update CNAME
    omit 1333db1  Update CNAME
    omit 52ed6c3  Update CNAME
    omit 9aaae9e  Update font style. (#9)
    omit e88ee3e  add cname
    omit 92635bd  trigger
    omit e4416b2  update
    omit cc377f7  update homepage
    omit a0710d1  update
    omit 07ecb74  Update community link
    omit a646c0a  update logo and community guide
    omit ddae83d  update
    omit 42b9065  add square
    omit 94a5fad  add logo
    omit 940a96a  fix
    omit 6c3de5c  add coauthor info (#8)
    omit a68337b  minor tweak
    omit 378d70e  blog for optimize batch matmul in transformer model (#7)
    omit c8e6639  ack emscripten
    omit af6e700  update
    omit f3e284e  Add OpenGL/WebGL post. (#6)
    omit 774da36  update
    omit cc504d3  update
    omit 9ca9c9d  fix typo
    omit a799ab6  add landing image
    omit 70f7c0e  updates
    omit e92f48d  fix
    omit c4deb94  blog for optimization on mali gpu (#5)
    omit 8384f83  fix
    omit a4dc773  blog for android rpc introduction (#4)
    omit b4f71b2  ROCm backend blog post (#3)
    omit cde5e0f  explicit mention apache in mxnet
    omit add7566  More acks
    omit a4aaf0e  Fix minor languages
    omit 17e410e  [BLOG] NNVM components
    omit 116f0e4  Add project acknowledgement
    omit 6bbd23c  Add Affliation of xianyi
    omit 4dbc980  add depthconv tutorial blog (#1)
    omit 9965542  [STYLE] Not pad first line
    omit e58e514  [STYLE] Wrap code content
    omit be18f3d  Merge pull request #2 from thinxer/patch-1
    omit c850bce  Update 2017-08-17-tvm-release-announcement.markdown
    omit 0fb7b52  [CSS] correct highlighter
    omit dd73e31  [BLOG] Initial annoucement
     new a98e52d  Build at Mon Nov 25 21:30:46 PST 2019

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (707b965)
            \
             N -- N -- N   refs/heads/asf-site (a98e52d)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitignore                                         |   3 +
 .gitmodules                                        |   3 -
 2017/08/17/tvm-release-announcement.html           | 280 ++++++++
 ...s-with-TVM-A-Depthwise-Convolution-Example.html | 735 +++++++++++++++++++
 2017/10/06/nnvm-compiler-announcement.html         | 235 +++++++
 ...s-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html | 378 ++++++++++
 2017/11/08/android-rpc-introduction.html           | 384 ++++++++++
 2018/01/16/opt-mali-gpu.html                       | 730 +++++++++++++++++++
 2018/03/12/webgl.html                              | 272 +++++++
 2018/03/23/nmt-transformer-optimize.html           | 418 +++++++++++
 2018/07/12/vta-release-announcement.html           | 294 ++++++++
 2018/08/10/DLPack-Bridge.html                      | 295 ++++++++
 2018/10/03/auto-opt-all.html                       | 550 +++++++++++++++
 2018/10/09/ml-in-tees.html                         | 272 +++++++
 2018/12/18/lowprecision-conv.html                  | 317 +++++++++
 2019/01/19/Golang.html                             | 326 +++++++++
 2019/03/18/tvm-apache-announcement.html            | 179 +++++
 2019/04/29/opt-cuda-quantized.html                 | 300 ++++++++
 2019/05/30/pytorch-frontend.html                   | 258 +++++++
 404.html                                           |   1 -
 Jenkinsfile                                        | 124 ----
 README.md                                          |   5 +-
 Rakefile                                           | 306 --------
 _config.yml                                        |  34 -
 _includes/JB/analytics                             |  20 -
 _includes/JB/analytics-providers/gauges            |  13 -
 _includes/JB/analytics-providers/getclicky         |  12 -
 _includes/JB/analytics-providers/google            |  13 -
 _includes/JB/analytics-providers/google-universal  |   9 -
 _includes/JB/analytics-providers/mixpanel          |  11 -
 _includes/JB/analytics-providers/piwik             |  10 -
 _includes/JB/categories_list                       |  37 -
 _includes/JB/comments                              |  18 -
 _includes/JB/comments-providers/disqus             |  15 -
 _includes/JB/comments-providers/duoshuo            |  14 -
 _includes/JB/comments-providers/facebook           |   9 -
 _includes/JB/comments-providers/intensedebate      |   6 -
 _includes/JB/comments-providers/livefyre           |   6 -
 _includes/JB/feedburner                            |   3 -
 _includes/JB/file_exists                           |  26 -
 _includes/JB/gist                                  |  19 -
 _includes/JB/is_production                         |  43 --
 _includes/JB/liquid_raw                            |  32 -
 _includes/JB/pages_list                            |  47 --
 _includes/JB/posts_collate                         |  55 --
 _includes/JB/setup                                 |  31 -
 _includes/JB/sharing                               |   9 -
 _includes/JB/sort_collection                       |  81 ---
 _includes/JB/tags_list                             |  33 -
 _includes/custom/page_list                         |   4 -
 _includes/themes/custom-twitter/default.html       |  60 --
 _includes/themes/custom-twitter/index.html         |   1 -
 _includes/themes/custom-twitter/page.html          |   9 -
 _includes/themes/custom-twitter/post.html          |  27 -
 _includes/themes/custom-twitter/settings.yml       |   2 -
 _layouts/default.html                              |   6 -
 _layouts/index.html                                |   7 -
 _layouts/page.html                                 |   7 -
 _layouts/post.html                                 |   7 -
 _plugins/debug.rb                                  |  38 -
 .../2017-08-17-tvm-release-announcement.markdown   | 167 -----
 ...ors-with-TVM-A-Depthwise-Convolution-Example.md | 428 -----------
 .../2017-10-06-nnvm-compiler-announcement.markdown |  93 ---
 ...PUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.md | 274 --------
 _posts/2017-11-08-android-rpc-introduction.md      | 249 -------
 _posts/2018-01-16-opt-mali-gpu.md                  | 451 ------------
 _posts/2018-03-12-webgl.md                         | 113 ---
 .../2018-03-23-nmt-transformer-optimize.markdown   | 204 ------
 .../2018-07-12-vta-release-announcement.markdown   | 153 ----
 _posts/2018-08-10-DLPack-Bridge.md                 | 150 ----
 _posts/2018-10-03-auto-opt-all.md                  | 198 ------
 _posts/2018-10-09-ml-in-tees.md                    | 121 ----
 _posts/2018-12-18-lowprecision-conv.md             | 169 -----
 _posts/2019-01-19-Golang.md                        | 167 -----
 _posts/2019-03-18-tvm-apache-announcement.md       |  24 -
 _posts/2019-04-30-opt-cuda-quantized.md            | 147 ----
 _posts/2019-05-30-pytorch-frontend.md              | 108 ---
 about.html                                         | 171 +++++
 about.md                                           |  30 -
 .../bootstrap/css/bootstrap.2.2.2.min.css          | 782 ---------------------
 .../bootstrap/img/glyphicons-halflings-white.png   | Bin 8777 -> 0 bytes
 .../bootstrap/img/glyphicons-halflings.png         | Bin 12799 -> 0 bytes
 .../themes/custom-twitter/css/1.4.0/bootstrap.css  | 356 ----------
 assets/themes/custom-twitter/css/style.css         | 413 -----------
 atom.xml                                           |  28 -
 blog.html                                          | 327 ++++++++-
 categories.html                                    |  20 -
 community.html                                     | 233 ++++++
 community.md                                       |  66 --
 images/community/alicloud.png                      | Bin 0 -> 20301 bytes
 images/community/aws.png                           | Bin 0 -> 32934 bytes
 images/community/cornell.svg                       | 232 ++++++
 images/community/huawei.png                        | Bin 0 -> 7913 bytes
 images/community/intel.png                         | Bin 0 -> 4489 bytes
 images/community/microsoft.png                     | Bin 0 -> 24164 bytes
 images/community/oasislabs.png                     | Bin 0 -> 37771 bytes
 images/community/octoml.svg                        |   1 +
 images/community/ucberkeley.png                    | Bin 0 -> 32071 bytes
 images/community/ucla.png                          | Bin 0 -> 45279 bytes
 images/community/uwcse.png                         | Bin 0 -> 19344 bytes
 images/community/xilinx.png                        | Bin 0 -> 12702 bytes
 index.html                                         |  23 -
 rss.xml                                            |  28 -
 scripts/task_build_website.sh                      |   4 -
 serve_local.sh                                     |   3 -
 sitemap.txt                                        |   8 -
 tags.html                                          |  20 -
 tvm                                                |   1 -
 vta.html                                           | 175 +++++
 vta.md                                             |  34 -
 110 files changed, 7356 insertions(+), 6254 deletions(-)
 delete mode 100644 .gitmodules
 create mode 100644 2017/08/17/tvm-release-announcement.html
 create mode 100644 2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
 create mode 100644 2017/10/06/nnvm-compiler-announcement.html
 create mode 100644 2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
 create mode 100644 2017/11/08/android-rpc-introduction.html
 create mode 100644 2018/01/16/opt-mali-gpu.html
 create mode 100644 2018/03/12/webgl.html
 create mode 100644 2018/03/23/nmt-transformer-optimize.html
 create mode 100644 2018/07/12/vta-release-announcement.html
 create mode 100644 2018/08/10/DLPack-Bridge.html
 create mode 100644 2018/10/03/auto-opt-all.html
 create mode 100644 2018/10/09/ml-in-tees.html
 create mode 100644 2018/12/18/lowprecision-conv.html
 create mode 100644 2019/01/19/Golang.html
 create mode 100644 2019/03/18/tvm-apache-announcement.html
 create mode 100644 2019/04/29/opt-cuda-quantized.html
 create mode 100644 2019/05/30/pytorch-frontend.html
 delete mode 100644 404.html
 delete mode 100644 Jenkinsfile
 delete mode 100644 Rakefile
 delete mode 100644 _config.yml
 delete mode 100644 _includes/JB/analytics
 delete mode 100644 _includes/JB/analytics-providers/gauges
 delete mode 100644 _includes/JB/analytics-providers/getclicky
 delete mode 100644 _includes/JB/analytics-providers/google
 delete mode 100644 _includes/JB/analytics-providers/google-universal
 delete mode 100644 _includes/JB/analytics-providers/mixpanel
 delete mode 100755 _includes/JB/analytics-providers/piwik
 delete mode 100644 _includes/JB/categories_list
 delete mode 100644 _includes/JB/comments
 delete mode 100644 _includes/JB/comments-providers/disqus
 delete mode 100644 _includes/JB/comments-providers/duoshuo
 delete mode 100644 _includes/JB/comments-providers/facebook
 delete mode 100644 _includes/JB/comments-providers/intensedebate
 delete mode 100644 _includes/JB/comments-providers/livefyre
 delete mode 100644 _includes/JB/feedburner
 delete mode 100644 _includes/JB/file_exists
 delete mode 100644 _includes/JB/gist
 delete mode 100644 _includes/JB/is_production
 delete mode 100644 _includes/JB/liquid_raw
 delete mode 100644 _includes/JB/pages_list
 delete mode 100644 _includes/JB/posts_collate
 delete mode 100644 _includes/JB/setup
 delete mode 100644 _includes/JB/sharing
 delete mode 100644 _includes/JB/sort_collection
 delete mode 100644 _includes/JB/tags_list
 delete mode 100644 _includes/custom/page_list
 delete mode 100644 _includes/themes/custom-twitter/default.html
 delete mode 100644 _includes/themes/custom-twitter/index.html
 delete mode 100644 _includes/themes/custom-twitter/page.html
 delete mode 100644 _includes/themes/custom-twitter/post.html
 delete mode 100644 _includes/themes/custom-twitter/settings.yml
 delete mode 100644 _layouts/default.html
 delete mode 100644 _layouts/index.html
 delete mode 100644 _layouts/page.html
 delete mode 100644 _layouts/post.html
 delete mode 100644 _plugins/debug.rb
 delete mode 100644 _posts/2017-08-17-tvm-release-announcement.markdown
 delete mode 100644 _posts/2017-08-22-Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.md
 delete mode 100644 _posts/2017-10-06-nnvm-compiler-announcement.markdown
 delete mode 100644 _posts/2017-10-30-Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.md
 delete mode 100644 _posts/2017-11-08-android-rpc-introduction.md
 delete mode 100644 _posts/2018-01-16-opt-mali-gpu.md
 delete mode 100644 _posts/2018-03-12-webgl.md
 delete mode 100644 _posts/2018-03-23-nmt-transformer-optimize.markdown
 delete mode 100644 _posts/2018-07-12-vta-release-announcement.markdown
 delete mode 100644 _posts/2018-08-10-DLPack-Bridge.md
 delete mode 100644 _posts/2018-10-03-auto-opt-all.md
 delete mode 100644 _posts/2018-10-09-ml-in-tees.md
 delete mode 100644 _posts/2018-12-18-lowprecision-conv.md
 delete mode 100644 _posts/2019-01-19-Golang.md
 delete mode 100644 _posts/2019-03-18-tvm-apache-announcement.md
 delete mode 100644 _posts/2019-04-30-opt-cuda-quantized.md
 delete mode 100644 _posts/2019-05-30-pytorch-frontend.md
 create mode 100644 about.html
 delete mode 100644 about.md
 delete mode 100644 assets/themes/custom-twitter/bootstrap/css/bootstrap.2.2.2.min.css
 delete mode 100644 assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings-white.png
 delete mode 100644 assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings.png
 delete mode 100644 assets/themes/custom-twitter/css/1.4.0/bootstrap.css
 delete mode 100644 assets/themes/custom-twitter/css/style.css
 delete mode 100755 atom.xml
 delete mode 100644 categories.html
 create mode 100644 community.html
 delete mode 100644 community.md
 create mode 100644 images/community/alicloud.png
 create mode 100644 images/community/aws.png
 create mode 100644 images/community/cornell.svg
 create mode 100644 images/community/huawei.png
 create mode 100644 images/community/intel.png
 create mode 100644 images/community/microsoft.png
 create mode 100644 images/community/oasislabs.png
 create mode 100644 images/community/octoml.svg
 create mode 100644 images/community/ucberkeley.png
 create mode 100644 images/community/ucla.png
 create mode 100644 images/community/uwcse.png
 create mode 100644 images/community/xilinx.png
 delete mode 100644 index.html
 delete mode 100755 rss.xml
 delete mode 100755 scripts/task_build_website.sh
 delete mode 100755 serve_local.sh
 delete mode 100644 sitemap.txt
 delete mode 100644 tags.html
 delete mode 160000 tvm
 create mode 100644 vta.html
 delete mode 100644 vta.md


[incubator-tvm-test] 01/01: Build at Mon Nov 25 21:30:46 PST 2019

Posted by tq...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-test.git

commit a98e52dfa604c5cbfff11690499ae17680600d8b
Author: tqchen <tq...@gmail.com>
AuthorDate: Mon Nov 25 21:30:46 2019 -0800

    Build at Mon Nov 25 21:30:46 PST 2019
---
 .gitignore                                         |   5 +
 2017/08/17/tvm-release-announcement.html           | 280 ++++++++
 ...s-with-TVM-A-Depthwise-Convolution-Example.html | 735 +++++++++++++++++++
 2017/10/06/nnvm-compiler-announcement.html         | 235 +++++++
 ...s-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html | 378 ++++++++++
 2017/11/08/android-rpc-introduction.html           | 384 ++++++++++
 2018/01/16/opt-mali-gpu.html                       | 730 +++++++++++++++++++
 2018/03/12/webgl.html                              | 272 +++++++
 2018/03/23/nmt-transformer-optimize.html           | 418 +++++++++++
 2018/07/12/vta-release-announcement.html           | 294 ++++++++
 2018/08/10/DLPack-Bridge.html                      | 295 ++++++++
 2018/10/03/auto-opt-all.html                       | 550 +++++++++++++++
 2018/10/09/ml-in-tees.html                         | 272 +++++++
 2018/12/18/lowprecision-conv.html                  | 317 +++++++++
 2019/01/19/Golang.html                             | 326 +++++++++
 2019/03/18/tvm-apache-announcement.html            | 179 +++++
 2019/04/29/opt-cuda-quantized.html                 | 300 ++++++++
 2019/05/30/pytorch-frontend.html                   | 258 +++++++
 404.html                                           |   1 -
 CNAME                                              |   1 +
 README.md                                          |  13 +-
 Rakefile                                           | 306 --------
 _config.yml                                        | 138 ----
 _includes/JB/analytics                             |  20 -
 _includes/JB/analytics-providers/gauges            |  13 -
 _includes/JB/analytics-providers/getclicky         |  12 -
 _includes/JB/analytics-providers/google            |  13 -
 _includes/JB/analytics-providers/google-universal  |   9 -
 _includes/JB/analytics-providers/mixpanel          |  11 -
 _includes/JB/analytics-providers/piwik             |  10 -
 _includes/JB/categories_list                       |  37 -
 _includes/JB/comments                              |  18 -
 _includes/JB/comments-providers/disqus             |  15 -
 _includes/JB/comments-providers/duoshuo            |  14 -
 _includes/JB/comments-providers/facebook           |   9 -
 _includes/JB/comments-providers/intensedebate      |   6 -
 _includes/JB/comments-providers/livefyre           |   6 -
 _includes/JB/feedburner                            |   3 -
 _includes/JB/file_exists                           |  26 -
 _includes/JB/gist                                  |  19 -
 _includes/JB/is_production                         |  43 --
 _includes/JB/liquid_raw                            |  32 -
 _includes/JB/pages_list                            |  47 --
 _includes/JB/posts_collate                         |  55 --
 _includes/JB/setup                                 |  31 -
 _includes/JB/sharing                               |   9 -
 _includes/JB/sort_collection                       |  81 ---
 _includes/JB/tags_list                             |  33 -
 _includes/custom/page_list                         |   4 -
 _includes/themes/custom-twitter/default.html       |  64 --
 _includes/themes/custom-twitter/index.html         |   6 -
 _includes/themes/custom-twitter/page.html          |   9 -
 _includes/themes/custom-twitter/post.html          |  40 --
 _includes/themes/custom-twitter/settings.yml       |   2 -
 _layouts/default.html                              |   6 -
 _layouts/index.html                                |   7 -
 _layouts/page.html                                 |   7 -
 _layouts/post.html                                 |   7 -
 _plugins/debug.rb                                  |  38 -
 _site/404.html                                     |   1 -
 _site/archive.html                                 | 149 ----
 .../bootstrap/css/bootstrap.2.2.2.min.css          | 782 ---------------------
 .../bootstrap/img/glyphicons-halflings-white.png   | Bin 8777 -> 0 bytes
 .../bootstrap/img/glyphicons-halflings.png         | Bin 12799 -> 0 bytes
 .../themes/custom-twitter/css/1.4.0/bootstrap.css  | 356 ----------
 _site/assets/themes/custom-twitter/css/style.css   |  69 --
 _site/assignments.html                             | 142 ----
 _site/atom.xml                                     |  16 -
 _site/categories.html                              | 157 -----
 _site/index.html                                   | 178 -----
 _site/rss.xml                                      |  15 -
 _site/schedule.html                                | 235 -------
 _site/serve_local.sh                               |   3 -
 _site/sitemap.txt                                  |  10 -
 _site/tags.html                                    | 158 -----
 about.html                                         | 171 +++++
 archive.html                                       |  10 -
 .../bootstrap/css/bootstrap.2.2.2.min.css          | 782 ---------------------
 .../bootstrap/img/glyphicons-halflings-white.png   | Bin 8777 -> 0 bytes
 .../bootstrap/img/glyphicons-halflings.png         | Bin 12799 -> 0 bytes
 .../themes/custom-twitter/css/1.4.0/bootstrap.css  | 356 ----------
 assets/themes/custom-twitter/css/style.css         |  69 --
 assignments.md                                     |  11 -
 atom.xml                                           |  28 -
 blog.html                                          | 327 +++++++++
 categories.html                                    |  20 -
 community.html                                     | 233 ++++++
 images/android_rpc/app.png                         | Bin 0 -> 254593 bytes
 images/android_rpc/app_error.png                   | Bin 0 -> 278170 bytes
 images/android_rpc/arch.png                        | Bin 0 -> 47147 bytes
 images/android_rpc/flow1.png                       | Bin 0 -> 22077 bytes
 images/android_rpc/flow2.png                       | Bin 0 -> 16710 bytes
 images/autotune-all/amd.png                        | Bin 0 -> 51573 bytes
 images/autotune-all/arm.png                        | Bin 0 -> 52362 bytes
 images/autotune-all/autotvm.png                    | Bin 0 -> 96078 bytes
 images/autotune-all/mali.png                       | Bin 0 -> 40033 bytes
 images/autotune-all/nvidia.png                     | Bin 0 -> 63880 bytes
 images/autotune-all/overview.png                   | Bin 0 -> 100527 bytes
 images/community/alicloud.png                      | Bin 0 -> 20301 bytes
 images/community/aws.png                           | Bin 0 -> 32934 bytes
 images/community/cornell.svg                       | 232 ++++++
 images/community/huawei.png                        | Bin 0 -> 7913 bytes
 images/community/intel.png                         | Bin 0 -> 4489 bytes
 images/community/microsoft.png                     | Bin 0 -> 24164 bytes
 images/community/oasislabs.png                     | Bin 0 -> 37771 bytes
 images/community/octoml.svg                        |   1 +
 images/community/ucberkeley.png                    | Bin 0 -> 32071 bytes
 images/community/ucla.png                          | Bin 0 -> 45279 bytes
 images/community/uwcse.png                         | Bin 0 -> 19344 bytes
 images/community/xilinx.png                        | Bin 0 -> 12702 bytes
 images/cuda-quantized/benchmark.svg                | 678 ++++++++++++++++++
 images/cuda-quantized/conv2d.png                   | Bin 0 -> 166087 bytes
 images/cuda-quantized/workflow.png                 | Bin 0 -> 76962 bytes
 images/depthconv_tutorial/GPU_memory_hierarchy.png | Bin 0 -> 218921 bytes
 images/depthconv_tutorial/bank_conflicts.png       | Bin 0 -> 55248 bytes
 images/depthconv_tutorial/conv_and_depthconv.png   | Bin 0 -> 189241 bytes
 images/depthconv_tutorial/no_tiling.png            | Bin 0 -> 117108 bytes
 images/depthconv_tutorial/tf_compare.png           | Bin 0 -> 44260 bytes
 images/depthconv_tutorial/tiling.png               | Bin 0 -> 83580 bytes
 .../vthread_and_strided_pattern.png                | Bin 0 -> 45323 bytes
 images/docs/inferbound/gatherbound.png             | Bin 0 -> 25803 bytes
 images/docs/inferbound/gatherbound_problem.png     | Bin 0 -> 7684 bytes
 images/docs/inferbound/inferbound_phases.png       | Bin 0 -> 33197 bytes
 images/docs/inferbound/inferbound_traversal.png    | Bin 0 -> 14670 bytes
 images/docs/inferbound/passupdomain_div.png        | Bin 0 -> 9869 bytes
 images/docs/inferbound/passupdomain_min.png        | Bin 0 -> 12572 bytes
 images/docs/inferbound/passupdomain_nodiv.png      | Bin 0 -> 12907 bytes
 images/docs/inferbound/passupdomain_problem.png    | Bin 0 -> 17530 bytes
 images/docs/inferbound/relations.png               | Bin 0 -> 23321 bytes
 images/docs/inferbound/stage_graph.png             | Bin 0 -> 13909 bytes
 images/docs/inferbound/union.png                   | Bin 0 -> 3135 bytes
 images/golang/TVM-Golang-Blog.png                  | Bin 0 -> 25895 bytes
 images/golang/TVM-Golang-Flow.png                  | Bin 0 -> 95181 bytes
 images/logo/tvm-banner-left-objs-white.svg         | 380 ++++++++++
 images/logo/tvm-banner-right-objs-white.svg        | 205 ++++++
 images/logo/tvm-logo-small-black.png               | Bin 0 -> 13941 bytes
 images/logo/tvm-logo-small.png                     | Bin 0 -> 13281 bytes
 images/logo/tvm-logo-square.png                    | Bin 0 -> 3453 bytes
 images/logo/tvm-logo.eps                           | Bin 0 -> 16890811 bytes
 images/logo/tvm-logo.png                           | Bin 0 -> 135763 bytes
 images/low-precision/binary-dotproduct.png         | Bin 0 -> 9661 bytes
 images/low-precision/bitpack.png                   | Bin 0 -> 28898 bytes
 images/low-precision/bitserial-dotproduct.png      | Bin 0 -> 21066 bytes
 images/low-precision/rasp-conv-2.png               | Bin 0 -> 31496 bytes
 images/low-precision/rasp-conv.png                 | Bin 0 -> 56752 bytes
 images/low-precision/workflow.png                  | Bin 0 -> 52660 bytes
 images/low-precision/x86-conv.png                  | Bin 0 -> 58302 bytes
 images/main/stack_tvmlang.png                      | Bin 0 -> 191187 bytes
 images/main/tvm-stack.png                          | Bin 0 -> 201190 bytes
 images/nmt-transformer/batch-matmul-bar-charts.png | Bin 0 -> 16477 bytes
 images/nmt-transformer/batchmatmul.png             | Bin 0 -> 126410 bytes
 images/nmt-transformer/model_arch.png              | Bin 0 -> 150011 bytes
 images/nnvm/nnvm_compiler_code.png                 | Bin 0 -> 150107 bytes
 images/nnvm/nnvm_compiler_stack.png                | Bin 0 -> 145657 bytes
 images/nnvm/nnvm_deploy.png                        | Bin 0 -> 174628 bytes
 images/nnvm/nnvm_k80_result.png                    | Bin 0 -> 40529 bytes
 images/nnvm/nnvm_rasp_result.png                   | Bin 0 -> 45314 bytes
 images/opengl/comparison.png                       | Bin 0 -> 48992 bytes
 images/opengl/opengl-benchmark.png                 | Bin 0 -> 14876 bytes
 images/opengl/webgl-flow.png                       | Bin 0 -> 37250 bytes
 images/opt-mali/end2end.png                        | Bin 0 -> 108291 bytes
 images/opt-mali/mali-arch.png                      | Bin 0 -> 27153 bytes
 images/pytorch-dlpack/dlpack.png                   | Bin 0 -> 590212 bytes
 images/relay/dataflow.png                          | Bin 0 -> 23618 bytes
 images/relay/dataflow_vs_func.png                  | Bin 0 -> 72025 bytes
 images/relay/let_scope.png                         | Bin 0 -> 20807 bytes
 images/release/code_highlevel.png                  | Bin 0 -> 144900 bytes
 images/release/computational_graph.png             | Bin 0 -> 39760 bytes
 images/release/end_to_end_stack.png                | Bin 0 -> 125810 bytes
 images/release/gap.png                             | Bin 0 -> 364712 bytes
 images/release/gpu_mobilenet.png                   | Bin 0 -> 63162 bytes
 images/release/nnvm_gap.png                        | Bin 0 -> 321147 bytes
 images/release/resnet_rasp.png                     | Bin 0 -> 128945 bytes
 images/release/tvm_backends.png                    | Bin 0 -> 91358 bytes
 images/release/tvm_dsl.png                         | Bin 0 -> 117338 bytes
 images/release/tvm_flexible.png                    | Bin 0 -> 192781 bytes
 images/release/tvm_rpc.png                         | Bin 0 -> 172825 bytes
 images/rocm/butterfly.png                          | Bin 0 -> 270847 bytes
 images/rocm/cat.png                                | Bin 0 -> 140740 bytes
 images/rocm/rocm_workflow.png                      | Bin 0 -> 182323 bytes
 images/rocm/tvm_rocm_overview.png                  | Bin 0 -> 182111 bytes
 images/sgx/dp.png                                  | Bin 0 -> 47041 bytes
 images/sgx/dpnn.png                                | Bin 0 -> 41091 bytes
 images/sgx/sgx.png                                 | Bin 0 -> 87507 bytes
 images/sgx/tvmfits.png                             | Bin 0 -> 79224 bytes
 index.md                                           |  37 -
 rss.xml                                            |  28 -
 schedule.md                                        |  35 -
 serve_local.sh                                     |   3 -
 sitemap.txt                                        |   8 -
 tags.html                                          |  20 -
 vta.html                                           | 175 +++++
 192 files changed, 8643 insertions(+), 4886 deletions(-)

diff --git a/.gitignore b/.gitignore
index b25c15b..86f470d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,6 @@
 *~
+_site
+.DS_Store
+.*
+website.tgz
+scripts
diff --git a/2017/08/17/tvm-release-announcement.html b/2017/08/17/tvm-release-announcement.html
new file mode 100644
index 0000000..206004b
--- /dev/null
+++ b/2017/08/17/tvm-release-announcement.html
@@ -0,0 +1,280 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>TVM: An End to End IR Stack for Deploying Deep Learning Workloads on Hardware Platforms</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>TVM: An End to End IR Stack for Deploying Deep Learning Workloads on Hardware Platforms </h1>
+      <p class="post-meta">
+        <time datetime="2017-08-17T12:00:00-07:00" itemprop="datePublished">
+          Aug 17, 2017
+        </time>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p style="text-align: center">Tianqi Chen(project lead), Thierry Moreau(hardware stack), Ziheng Jiang†(graph compilation), Haichen Shen(gpu optimization)</p>
+<p style="text-align: center">Advisors: Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy</p>
+<p style="text-align: center">Paul G. Allen School of Computer Science &amp; Engineering, University of Washington</p>
+<p style="text-align: center">DMLC open-source community</p>
+<p style="text-align: center">†Amazon Web Service</p>
+
+<p>Deep learning has become ubiquitous and indispensable.  Part of this revolution has been fueled by scalable deep learning systems, such as TensorFlow, MXNet, Caffe and PyTorch. Most existing systems are optimized for a narrow range of server-class GPUs, and require significant effort be deployed on other platforms such as mobile phones, IoT devices and specialized accelerators (FPGAs, ASICs). As the number of deep learning frameworks and hardware backends increase, we propose a unifie [...]
+
+<p style="text-align: center"><img src="/images/release/gap.png" alt="image" width="512px" /></p>
+<p>We are excited to announce the launch of TVM as solution to this problem. TVM is  a novel framework that can:</p>
+
+<ul>
+  <li>Represent and optimize the common deep learning computation workloads for CPUs, GPUs and other specialized hardware</li>
+  <li>Automatically transform the computation graph to minimize memory utilization, optimize data layout and fuse computation patterns</li>
+  <li>Provide an end-to-end compilation from existing front-end frameworks down to bare-metal hardware, all the way up to browser executable javascripts.</li>
+</ul>
+
+<p>With the help of TVM, we can easily run deep learning workloads on mobile phones, embedded devices and even the browser with little additional effort. TVM also provides a unified optimization framework for deep learning workloads on a multitude of hardware platforms, including specialized accelerators that rely on novel computational primitives.</p>
+
+<p style="text-align: center"><img src="/images/release/end_to_end_stack.png" alt="image" width="512px" /></p>
+
+<p>We adopt a common philosophy from the compiler community and provide two intermediate representation layers to efficiently lower high-level deep learning algorithms down to a multitude of hardware back-ends.</p>
+
+<p>In today’s release, we open-source TVM package that contains optimization primitives for x86, ARM, OpenCL, Metal, CUDA and Javascript. We are actively working on adding support for specialized hardware acceleration and Nvidia’s GEMM-optimized Volta architecture.</p>
+
+<h2 id="technical-details">Technical Details</h2>
+
+<p>The goal of TVM stack is to provide a reusable toolchain to compile high-level neural network descriptions from deep learning framework frontends down to low-level machine code for multiple hardware backends. Take Apache MXNet as a front-end example, the following code snippet demonstrates how can TVM be used to compile a high-level description of a deep learning model to an optimized executable module tailored to the target hardware.</p>
+
+<p style="text-align: center"><img src="/images/release/code_highlevel.png" alt="image" width="800px" /></p>
+
+<p>The challenge lies in enabling support for multiple hardware back-ends while keeping compute, memory and energy footprints at their lowest. We borrow wisdom from the compiler community in order to bridge the gap between the multitude of deep learning frameworks and hardware back-ends: we build a two-level intermediate layer composed of NNVM, a high-level intermediate representation (IR) for task scheduling and memory management, and TVM, an expressive low-level IR for optimizing compu [...]
+
+<p>The first level of the stack is a computational graph based representation. A computation graph is a directed acyclic graph that represent computation as nodes and dataflow dependency as edges. This representation is very powerful: it allows us to bake operation attributes into the computation graph and specify transformation rules to iteratively optimize a computation graph. This is a common approach taken by most of the existing deep learning frameworks, including the NNVM graph rep [...]
+
+<p style="text-align: center"><img src="/images/release/computational_graph.png" alt="image" width="300px" /></p>
+
+<p>A lot of powerful optimizations can be supported by the graph optimization framework. For example, we provided a sublinear memory optimization functionality that allows user to train 1000 layers of ImageNet ResNet on a single GPU.</p>
+
+<p style="text-align: center"><img src="/images/release/nnvm_gap.png" alt="image" width="512px" /></p>
+
+<p>However, we find that the computational graph based IR alone is not enough to solve the challenge of supporting different hardware backends. The reason being that a single graph operator like convolution or matrix multiplication may be mapped and optimized in very different ways for each hardware back-end. These hardware-specific optimizations can vary drastically in terms of memory layout, parallelization threading patterns, caching access patterns and choice of hardware primitives.  [...]
+
+<p>We build a low level representation to solve this problem. This representation is based on index formula, with additional support for recurrence computation.</p>
+
+<p style="text-align: center"><img src="/images/release/tvm_dsl.png" alt="image" width="700px" /></p>
+
+<p>The low level IR adopt principles from existing image processing languages like Halide or darkroom to formulate an expressive deep learning DSL. TVM builds low level optimizations inspired by loop transformation tools like loopy and polyhedra-based analysis. We also draw inspiration from the dataflow description languages used in deep learning frameworks like MXNet, TensorFlow, Theano. The algorithms described in TVM are then processed in a scheduling phase to apply transformations th [...]
+
+<p style="text-align: center"><img src="/images/release/tvm_backends.png" alt="image" width="600px" /></p>
+
+<p>TVM includes standard transformation primitives commonly found in CPU optimization frameworks. More importantly, TVM incorporates novel optimization primitives targeted at GPUs, by exploiting thread cooperation patterns, data layout transformations and powerful new compute primitives. Using TVM in combination with NNVM provides an rich opportunity to optimize deep learning workloads across the software stack, enabling joint compute graph-level and operator-level optimizations.</p>
+
+<h3 id="multi-language-and-platform-support">Multi-language and Platform Support</h3>
+
+<p>One of the many strength of TVM lies in its rich support for multiple platforms and languages. We present two components of the framework: the compiler stack which contains complete optimization libraries to produce optimized machine code, and the runtime which is lightweight and offers the portability required to deploy the compiled modules on different platforms.</p>
+
+<p style="text-align: center"><img src="/images/release/tvm_flexible.png" alt="image" width="600px" /></p>
+
+<p>TVM currently support a python and C++ interface to the embedded compiler stack. We design the framework with maximum re-use in mind, so that the compiler stack improvements can be applied interchangeably between the Python and C++ components.</p>
+
+<p>We also provide a lightweight runtime that can directly run TVM compiled code in languages such as javascript, java, python, and c++ on platforms including android, iOS, raspberry pi and web browsers.</p>
+
+<h3 id="remote-deployment-and-execution">Remote Deployment and Execution</h3>
+
+<p style="text-align: center"><img src="/images/release/tvm_rpc.png" alt="image" width="500px" /></p>
+
+<p>TVM supports cross-compilation for and testing embedded devices with TVM RPC, a lightweight interface to deploy and execute TVM cross-compiled modules on a remote embedded device. This provides a familiar high-level Python interface to the TVM user to compile, optimize and test deep learning algorithms remotely on various low-level embedded devices.</p>
+
+<h2 id="performance">Performance</h2>
+
+<p>TVM is still in an early stage of development and we can expect more improvements to come, but we have started to see very promising results, which are discussed in this section.</p>
+
+<p>TVM gives us the flexibility to explore the rich optimization space of various deep learning kernels, for multiple hardware platforms. For instance, TVM allows us to tailor data layout and fused pattern requirements for the kernels and platforms that we most care about.  Please note that the baseline libraries are created for more general purpose problems, while TVM’s optimized kernels are heavily tuned for the workloads we evaluated via an auto-tuning process. TVM serves as a bridge  [...]
+
+<p>The results listed in this section are still work in progress, and there is room for improvement.</p>
+
+<h3 id="raspberry-pi">Raspberry Pi</h3>
+
+<p>In the first part of result we compared the TVM CPU schedule to nnpack on a raspberry Pi 3B executing a resnet workload. Due to limited time, we utilized TVM to implemented the direct convolution while nnpack was used to perform winograd conv for 3x3 kernels.</p>
+
+<p style="text-align: center"><img src="/images/release/resnet_rasp.png" alt="image" width="500px" /></p>
+
+<p>We can find that with TVM’s autotuned kernels, we can obtain performance similar to the hand-optimized kernels in nnpack for the raspberry pi experiments.</p>
+
+<h3 id="gpu-results">GPU Results</h3>
+<p><strong>Author Credit</strong> These benchmarks and corresponding schedule optimizations are created by our contributors:  <a href="http://www.ece.ucdavis.edu/~laurawly/">Leyuan Wang</a> (AWS / UCDavis), <a href="http://huyuwei.github.io">Yuwei Hu</a>(TuSimple) and Weitang Liu (AWS/ UCDavis). They deserve all the credits.</p>
+
+<p>As a proof of concept, we created an end to end compilation pipeline that can compile MxNet models down to TVM execution graphs. We apply optimization within and between graph nodes by automatically fusing operators together and letting TVM generate the fused kernels.
+We benchmarked the mobilenet ImageNet workload, and discuss the results below:</p>
+
+<p style="text-align: center"><img src="/images/release/gpu_mobilenet.png" alt="image" width="600px" /></p>
+
+<p>We can find that TVM can outperform our baseline method in terms of speed. More interestingly, the kernel fusion brings additional speedup. It is worth mentioning that TVM generates all the optimized GPU kernels on its own without relying on external libraries like CuDNN.</p>
+
+<p>We are working on more experiments and will release new results as they are obtained.</p>
+
+<h2 id="open-source-effort">Open Source Effort</h2>
+<p>TVM started as a research project of Paul G. Allen School Computer Science and Engineering at University of Washington. The TVM stack is designed to support <a href="https://github.com/dmlc/dlpack">DLPack</a>, a consensus on tensor data structure by multiple major deep learning frameworks. We have received early contributions from from UW, AWS, Qiho 360, Facebook, HKUST, TuSimple, UCDavis, SJTU as well members of DMLC open-source community and DLPack initiative. Going forward, the pro [...]
+
+<h2 id="acknowledgement">Acknowledgement</h2>
+<p>This project wouldn’t become possible without our early contributors. We would like to thank Yizhi Liu(Qihoo 360), Yuwei Hu(TuSimple),
+Xingjian Shi(HKUST), Leyuan Wang(AWS/UCDavis), Nicolas Vasilache(Facebook), Jian Weng(UCLA), Weitang Liu(AWS/UCDavis), Edward Z. Yang(Facebook),
+Lianmin Zheng(SJTU), Qiao Zhang(UW), William Moses(Facebook/MIT) and Hu Shiwen. The author would also like to thank Xianyi Zhang(PerfXLab) for helpful discussions.</p>
+
+<p>We also learnt a lot from the following projects when building TVM.</p>
+<ul>
+  <li><a href="https://github.com/halide/Halide">Halide</a>: TVM uses <a href="https://github.com/dmlc/HalideIR">HalideIR</a> as data structure for
+arithematic simplification and low level lowering. HalideIR is derived from Halide.
+We also learns from Halide when implementing the lowering pipeline in TVM.</li>
+  <li><a href="https://github.com/inducer/loopy">Loopy</a>: use of integer set analysis and its loop transformation primitives.</li>
+  <li><a href="https://github.com/Theano/Theano">Theano</a>: the design inspiration of symbolic scan operator for recurrence.</li>
+</ul>
+
+<h2 id="source-code">Source code</h2>
+<ul>
+  <li>Github page can be found here: <a href="https://github.com/dmlc/tvm">https://github.com/dmlc/tvm</a></li>
+  <li>TVM is <a href="https://github.com/dmlc/dlpack">DLPack</a> compatible, which makes it easy to support frameworks
+that adopts the standard, such as MXNet, PyTorch, Caffe2 and tiny-dnn.</li>
+</ul>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
new file mode 100644
index 0000000..feb4ce3
--- /dev/null
+++ b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
@@ -0,0 +1,735 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Optimize Deep Learning GPU Operators with TVM: A Depthwise Convolution Example</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Optimize Deep Learning GPU Operators with TVM: A Depthwise Convolution Example </h1>
+      <p class="post-meta">
+        <time datetime="2017-08-22T00:00:00-07:00" itemprop="datePublished">
+          Aug 22, 2017
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Yuwei Hu</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>Efficient deep learning operators are at the core of deep learning systems.
+Usually these operators are hard to optimize and require great efforts of HPC experts.
+<a href="https://github.com/dmlc/tvm">TVM</a>, an end to end tensor IR/DSL stack, makes this much easier.</p>
+
+<p>This blog teaches you how to write high-performance GPU operator kernels with the help of TVM.
+We use depthwise convolution (i.e. <a href="http://docs.tvmlang.org/api/python/topi.html#topi.nn.depthwise_conv2d_nchw">topi.nn.depthwise_conv2d_nchw</a>) as an example,
+and demonstrate how we can improve over the already hand optimized CUDA kernel in tensorflow.
+Our final version is 2x-4x faster than the optimized kernel in tf-1.2 under different workloads, and 3x-7x faster with operator fusion enabled.
+Below is the result tested on GTX1080, with filter size = [1, 256, 3, 3], stride = [1, 1], padding = ‘SAME’:</p>
+
+<p style="text-align: center"><img src="/images/depthconv_tutorial/tf_compare.png" alt="image" width="95%" /></p>
+
+<h2 id="introduction-to-depthwise-convolution">Introduction to Depthwise Convolution</h2>
+
+<p>Depthwise convolution is an important building block of modern architectures, such as Xception [1] and MobileNet [2].
+It’s an effective method to reduce the computation complexity of deep neural networks.</p>
+
+<p style="text-align: center"><img src="/images/depthconv_tutorial/conv_and_depthconv.png" alt="image" width="80%" /></p>
+
+<p style="text-align: center">source: <a href="http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/">http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/</a></p>
+
+<p>In TVM, depthwise convolution can be declared as:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># padding stage
+</span><span class="n">PaddedInput</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span>
+    <span class="p">(</span><span class="n">batch</span><span class="p">,</span> <span class="n">in_channel</span><span class="p">,</span> <span class="n">height_after_pad</span><span class="p">,</span> <span class="n">width_after_pad</span><span class="p">),</span>
+    <span class="k">lambda</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="n">select</span><span class="p">(</span>
+        <span class="n">tvm</span><span class="o">.</span><span class="nb">all</span><span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">pad_top</span> <span class="o">&lt;</span> <span class="n">in_height</span><span class="p">,</span> <span class="n">j</span> <span class="o">&gt;=</span> <span class="n">pad_left</span><span class="p">,</span [...]
+        <span class="n">Input</span><span class="p">[</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">j</span> <span class="o">-</span> <span class="n">pad_left</span><span class="p">],</span> <span class="n">tvm</span><span class="o">.</span><span class="n">const</span><span class="p">(</span><span class="mf">0.0 [...]
+    <span class="n">name</span><span class="o">=</span><span class="s">"PaddedInput"</span><span class="p">)</span>
+<span class="c1"># depthconv stage
+</span><span class="n">di</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">filter_height</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'di'</span><span class="p">)</span>
+<span class="n">dj</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">filter_width</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'dj'</span><span class="p">)</span>
+<span class="n">Output</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span>
+    <span class="p">(</span><span class="n">batch</span><span class="p">,</span> <span class="n">out_channel</span><span class="p">,</span> <span class="n">out_height</span><span class="p">,</span> <span class="n">out_width</span><span class="p">),</span>
+    <span class="k">lambda</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span>
+        <span class="n">PaddedInput</span><span class="p">[</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">/</span><span class="n">channel_multiplier</span><span class="p">,</span> <span class="n">i</span><span class="o">*</span><span class="n">stride_h</span> <span class="o">+</span> <span class="n">di</span><span class="p">,</span> <span class="n">j</span><span class="o">*</span><span class="n">stride_w</span> <span class="o">+</span> <sp [...]
+        <span class="n">axis</span><span class="o">=</span><span class="p">[</span><span class="n">di</span><span class="p">,</span> <span class="n">dj</span><span class="p">]),</span>
+    <span class="n">name</span><span class="o">=</span><span class="s">'DepthwiseConv2d'</span><span class="p">)</span>
+</code></pre></div></div>
+
+<h2 id="general-gpu-optimization-guidelines">General GPU Optimization Guidelines</h2>
+
+<p>This part briefly talks about three concepts we should know when optimizing CUDA code: data reuse, shared memory and bank conflicts.
+It would be great if you already know them, then you may skip this part.</p>
+
+<h3 id="data-reuse">Data Reuse</h3>
+<p>In modern computing architectures, the cost of loading data from memory is much higher than doing a single floating point computation [3].
+Because of that, we always want to reuse the input data after they are loaded into registers or shared memory (cache).</p>
+
+<p>There are two forms of data reuse in depthwise convolution: filter reuse and input reuse. Filter reuse happens as the filter slides over the input channel and computes multiple times.
+Input reuse is realized through tiling, let’s take 3x3 depthwise conv as an example:</p>
+
+<p style="text-align: center"><img src="/images/depthconv_tutorial/no_tiling.png" alt="image" width="70%" /></p>
+
+<p>Without tiling, each thread computes 1 output element and loads 3x3 input data. 16 threads together have 9x16 loads.</p>
+
+<p style="text-align: center"><img src="/images/depthconv_tutorial/tiling.png" alt="image" width="70%" /></p>
+
+<p>With tiling, each thread computes 2x2 output elements and loads 4x4 input data. 4 threads together have 16x4 loads.</p>
+
+<h3 id="shared-memory-and-bank-conflicts">Shared Memory and Bank Conflicts</h3>
+<p>Shared memory can be seen as cache in GPU. It is on-chip and much faster than global memory.</p>
+
+<p style="text-align: center"><img src="/images/depthconv_tutorial/GPU_memory_hierarchy.png" alt="image" width="256px" /></p>
+
+<p>Shared memory is allocated per block. It’s common practice to load data from global memory into shared memory, and then all threads in the block read data from shared memory.</p>
+
+<p>The size of shared memory is limited (usually 48K), so we must be cautious of shared memory overflow.
+Besides, too much shared memory allocated to one block limits the number of active blocks per multiprocessor.</p>
+
+<p>Another performance issue with shared memory is bank conflicts. Shared memory is divided into equally sized memory modules (banks) that can be accessed simultaneously,
+however, if multiple threads access the same memory bank (causing bank conflicts), the accesses will be serialized, thus decreasing the effective bandwidth.</p>
+
+<p>Shared memory banks are organized such that successive addresses are assigned to successive banks.
+To avoid bank conflicts, it’s better that successive threads access successive memory addresses, as illustrated below (each color represents one shared memory bank):</p>
+
+<p style="text-align: center"><img src="/images/depthconv_tutorial/bank_conflicts.png" alt="image" width="95%" /></p>
+
+<p>For more details on shared memory and bank conflicts, please refer to <a href="https://devblogs.nvidia.com/parallelforall/using-shared-memory-cuda-cc/">this Nvidia’s blog</a>.</p>
+
+<p>Ok, now let’s start optimizing depthwise convolution in TVM.</p>
+
+<h2 id="schedule-optimization">Schedule Optimization</h2>
+
+<h3 id="compute-paddedinput-inline-to-save-memory-allocation">Compute PaddedInput Inline to Save Memory Allocation</h3>
+<p>As we see from part 1, padding is declared explicitly as a separate stage. We compute it inline to avoid redundant memory allocation:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">PaddedInput</span><span class="p">]</span><span class="o">.</span><span class="n">compute_inline</span><span class="p">()</span>
+</code></pre></div></div>
+
+<h3 id="divide-one-large-channel-into-smaller-blocks">Divide One Large Channel into Smaller Blocks</h3>
+<p>One straightforward schedule for depthwise convolution is that one cuda block takes care of one input channel and corresponding filters, loading them into shared memory and then computing:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">IS</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cache_read</span><span class="p">(</span><span class="n">PaddedInput</span><span class="p">,</span> <span class="s">"shared"</span><span class="p">,</span> <span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">])</span>
+<span class="n">FS</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cache_read</span><span class="p">(</span><span class="n">Filter</span><span class="p">,</span> <span class="s">"shared"</span><span class="p">,</span> <span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">])</span>
+<span class="n">block_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.y"</span><span class="p">)</span>
+<span class="n">block_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">)</span>
+<span class="c1"># bind the dimension of batch (N in NCHW) with block_y
+</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">block_y</span><span class="p">)</span>
+<span class="c1"># bind the dimension of channel (C in NCHW) with block_x
+</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">block_x</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>We test the average time cost of 1000 runs on GTX 1080, and compare with <a href="https://www.tensorflow.org/versions/r0.12/api_docs/python/nn/convolution#depthwise_conv2d">depthwise_conv2d in tensorflow</a>.
+Here is the result:</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: center">Input</th>
+      <th style="text-align: center">Filter</th>
+      <th style="text-align: center">stride</th>
+      <th style="text-align: center">tf-1.2 SAME pad (us)</th>
+      <th style="text-align: center">TVM SAME pad (us)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: center">[1, 256, 21, 21]</td>
+      <td style="text-align: center">[256, 1, 3, 3]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">16.1</td>
+      <td style="text-align: center">9.1</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">[1, 256, 32, 32]</td>
+      <td style="text-align: center">[256, 1, 3, 3]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">34.8</td>
+      <td style="text-align: center">14.5</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">[1, 256, 64, 64]</td>
+      <td style="text-align: center">[256, 1, 3, 3]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">130.9</td>
+      <td style="text-align: center">98.9</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">[256, 1, 3, 3]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">251.6</td>
+      <td style="text-align: center">387.4</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>As we can see, this schedule performs well with small channel size like 21 x 21 or 32 x 32, however, its performance drops seriously as the channel size increases to larger than 64 x 64.
+One main reason is that too much shared memory allocated to one block limits the number of active blocks per multiprocessor.</p>
+
+<p>We modify the schedule to divide one large channel into smaller blocks. For example, one channel (64 x 64 or 96 x 96) is divided into blocks of 32 x 32,
+and one cuda block takes care of one 32 x 32 block:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">blocking_h</span> <span class="o">=</span> <span class="mi">32</span>
+<span class="n">blocking_w</span> <span class="o">=</span> <span class="mi">32</span>
+<span class="c1"># split the dimension of height (H in NCHW)
+</span><span class="n">bx1</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <sp [...]
+<span class="c1"># split the dimension of width (W in NCHW)
+</span><span class="n">bx2</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <sp [...]
+<span class="c1"># assign one 32 x 32 block to one cuda block
+</span><span class="n">by</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">fuse</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">Output</span><span class="o">.</span>< [...]
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">by</span><span class="p">,</span> <span class="n">block_y</span><span class="p">)</span>
+<span class="n">bx</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">fuse</span><span class="p">(</span><span class="n">bx1</span><span class="p">,</span> <span class="n">bx2</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">bx</span><span class="p">,</span> <span class="n">block_x</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>Here is the new result:</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: center">Input</th>
+      <th style="text-align: center">[blocking_h, blocking_w]</th>
+      <th style="text-align: center">tf-1.2 SAME pad (us)</th>
+      <th style="text-align: center">TVM SAME pad (us)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: center">[1, 256, 64, 64]</td>
+      <td style="text-align: center">[32, 32]</td>
+      <td style="text-align: center">130.9</td>
+      <td style="text-align: center">63.4</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">[32, 32]</td>
+      <td style="text-align: center">251.6</td>
+      <td style="text-align: center">132.5</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>Our blocking strategy works! For 64 x 64 channel size, it brings 1.6x acceleration (98.9us -&gt; 63.4us); for 96 x 96 channel size, it brings 2.9x acceleration (387.4us -&gt; 132.5us).</p>
+
+<h3 id="tuning-parameters-of-thread-numbers">Tuning Parameters of Thread Numbers</h3>
+
+<p>How to schedule the workload, say, 32x32 among the threads of one cuda block? Intuitively, it should be like this:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_thread_y</span> <span class="o">=</span> <span class="mi">8</span>
+<span class="n">num_thread_x</span> <span class="o">=</span> <span class="mi">8</span>
+<span class="n">thread_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="p">),</span> <span class="s">"threadIdx.y"</span><span class="p">)</span>
+<span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">),</span> <span class="s">"threadIdx.x"</span><span class="p">)</span>
+<span class="n">ty</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">h_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">)</span>
+<span class="n">tx</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">w_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">yi</span><span class="p">,</span> <span class="n">xi</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">thread_y</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>There are two parameters in the schedule: <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code>. How to determine the optimal combination of them? 
+Well, let’s first do some experiments. Below is the result with Filter = [256, 1, 3, 3] and stride = [1, 1]:</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: center">Case</th>
+      <th style="text-align: center">Input</th>
+      <th style="text-align: center">num_thread_y</th>
+      <th style="text-align: center">num_thread_x</th>
+      <th style="text-align: center">TVM SAME pad (us)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: center">1</td>
+      <td style="text-align: center">[1, 256, 32, 32]</td>
+      <td style="text-align: center">8</td>
+      <td style="text-align: center">32</td>
+      <td style="text-align: center">9.7</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">2</td>
+      <td style="text-align: center">[1, 256, 32, 32]</td>
+      <td style="text-align: center">4</td>
+      <td style="text-align: center">32</td>
+      <td style="text-align: center">8.8</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">3</td>
+      <td style="text-align: center">[1, 256, 32, 32]</td>
+      <td style="text-align: center">1</td>
+      <td style="text-align: center">32</td>
+      <td style="text-align: center">17.7</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">4</td>
+      <td style="text-align: center">[1, 256, 32, 32]</td>
+      <td style="text-align: center">32</td>
+      <td style="text-align: center">1</td>
+      <td style="text-align: center">32.5</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>Many interesting observations from above results:</p>
+
+<ul>
+  <li>
+    <p>Case 2 is faster than case 1. In case 2, each thread computes a 8x1 tile in output, which corresponds to a 10x3 tile in input.
+It has better data reuse than case 1’s 4x1 tile.</p>
+  </li>
+  <li>
+    <p>Case 3 is slower than case 2. It’s because in case 3, the workload per thread is too large and leads to much cost of local memory read.</p>
+  </li>
+  <li>
+    <p>Case 4 is slower than case 3. It’s because <code class="highlighter-rouge">num_thread_x = 32</code> ensures no bank conflicts, while <code class="highlighter-rouge">num_thread_y = 32</code> doesn’t.</p>
+  </li>
+</ul>
+
+<p>To summarize what we learn from above observations:</p>
+
+<ul>
+  <li>Large tile is good for data reuse, but not good for local memory read.</li>
+  <li>The influence of <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> on bank conflicts is asymmetric.</li>
+  <li>To find the optimal combination of <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> is to achieve a balance of efficient shared memory access (avoid bank conflicts), data reuse, and local memory read.</li>
+</ul>
+
+<p>Pretty tricky. So, what exactly should we do to find the optimal combination? The answer is brute force search. 
+We can pass <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> as arguments to the schedule function, and try all possible combinations to find the optimal one. This can be done easily in TVM:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">schedule_depthwise_conv2d</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
+    <span class="n">num_thread_y</span> <span class="o">=</span> <span class="n">num_thread_y</span>
+    <span class="n">num_thread_x</span> <span class="o">=</span> <span class="n">num_thread_x</span>
+    <span class="n">do_schedule_as_usual</span>
+    <span class="k">return</span> <span class="n">schedule</span>
+
+<span class="n">min_time_cost</span> <span class="o">=</span> <span class="n">inf</span>
+<span class="k">for</span> <span class="n">num_thread_y</span><span class="p">,</span> <span class="n">num_thread_x</span> <span class="ow">in</span> <span class="n">all_possible_combinations</span><span class="p">:</span>
+    <span class="n">schedule</span> <span class="o">=</span> <span class="n">schedule_depthwise_conv2d</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
+    <span class="n">time_cost</span> <span class="o">=</span> <span class="n">test_depthwise_conv2d</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">schedule</span><span class="p">)</span>
+    <span class="k">if</span> <span class="n">time_cost</span> <span class="o">&lt;</span> <span class="n">min_time_cost</span><span class="p">:</span>
+        <span class="n">min_time_cost</span> <span class="o">=</span> <span class="n">time_cost</span>
+        <span class="n">optimal_combination</span> <span class="o">=</span> <span class="p">[</span><span class="n">num_thread_y</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">]</span>
+</code></pre></div></div>
+
+<p>In fact, it can be seen as a simple auto scheduler.</p>
+
+<h3 id="vthread-and-strided-patterns">Vthread and Strided Patterns</h3>
+<p>Vthread (virtual thread) in TVM is introduced to support strided patterns. We can use it this way:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_vthread_y</span> <span class="o">=</span> <span class="mi">2</span>
+<span class="n">num_vthread_x</span> <span class="o">=</span> <span class="mi">2</span>
+<span class="n">num_thread_y</span> <span class="o">=</span> <span class="mi">8</span>
+<span class="n">num_thread_x</span> <span class="o">=</span> <span class="mi">8</span>
+<span class="n">thread_vy</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_vthread_y</span><span class="p">),</span> <span class="s">"vthread"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"vy"</span><span class="p">)</span>
+<span class="n">thread_vx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_vthread_x</span><span class="p">),</span> <span class="s">"vthread"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"vx"</span><span class="p">)</span>
+<span class="n">thread_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="p">),</span> <span class="s">"threadIdx.y"</span><span class="p">)</span>
+<span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">),</span> <span class="s">"threadIdx.x"</span><span class="p">)</span>
+<span class="c1"># split the dimension of height (H in NCHW) twice
+</span><span class="n">tvy</span><span class="p">,</span> <span class="n">vyi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">h_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_vthread_y</span><span class="p">)</span>
+<span class="n">ty</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">vyi</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">)</span>
+<span class="c1"># split the dimension of width (W in NCHW) twice
+</span><span class="n">tvx</span><span class="p">,</span> <span class="n">vxi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">w_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_vthread_x</span><span class="p">)</span>
+<span class="n">tx</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">vxi</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
+<span class="c1"># bind thread and vthread respectively
+</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tvy</span><span class="p">,</span> <span class="n">thread_vy</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tvx</span><span class="p">,</span> <span class="n">thread_vx</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">thread_y</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">tvy</span><span class="p">,</span> <span class="n">tvx</span><span class="p">,</span> <span class="n">ty</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">yi</span><span class="p">,</span> <span class="n">xi</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>Let’s print the IR to see what vthread does:</p>
+
+<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
+<span class="n">produce</span> <span class="n">DepthwiseConv2d</span> <span class="p">{</span>
+  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1</span>
+  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1</span>
+  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8</span>
+  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8</span>
+  <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
+      <span class="n">DepthwiseConv2d</span><span class="p">[((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><spa [...]
+      <span class="n">DepthwiseConv2d</span><span class="p">[(((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><sp [...]
+      <span class="n">DepthwiseConv2d</span><span class="p">[(((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><sp [...]
+      <span class="n">DepthwiseConv2d</span><span class="p">[(((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><sp [...]
+      <span class="k">for</span> <span class="p">(</span><span class="n">di</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
+        <span class="k">for</span> <span class="p">(</span><span class="n">dj</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
+          <span class="n">DepthwiseConv2d</span><span class="p">[((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> [...]
+          <span class="n">DepthwiseConv2d</span><span class="p">[(((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span [...]
+          <span class="n">DepthwiseConv2d</span><span class="p">[(((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span [...]
+          <span class="n">DepthwiseConv2d</span><span class="p">[(((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span [...]
+        <span class="p">}</span>
+      <span class="p">}</span>
+    <span class="p">}</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</code></pre></div></div>
+
+<p>Without vthread (just set to 1), the IR is:</p>
+
+<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
+<span class="n">produce</span> <span class="n">DepthwiseConv2d</span> <span class="p">{</span>
+  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1</span>
+  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1</span>
+  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8</span>
+  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8</span>
+  <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
+      <span class="n">DepthwiseConv2d</span><span class="p">[((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">8</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><span [...]
+      <span class="k">for</span> <span class="p">(</span><span class="n">di</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
+        <span class="k">for</span> <span class="p">(</span><span class="n">dj</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
+          <span class="n">DepthwiseConv2d</span><span class="p">[((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">8</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span>< [...]
+        <span class="p">}</span>
+      <span class="p">}</span>
+    <span class="p">}</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</code></pre></div></div>
+
+<p>As we can see, when <code class="highlighter-rouge">num_vthread_y = 2</code> and <code class="highlighter-rouge">num_vthread_x = 2</code>, the 32 x 32 channel is divided into four sub-channels of 16 x 16.
+Each thread computes four output elements at a time, one element in one sub-channel.</p>
+
+<p>Below is the result with Filter = [256, 1, 3, 3], stride = [1, 1], blocking_h = 32, blocking_w = 32:</p>
+
+<style>
+table th:nth-of-type(1) {
+    width: 120px;
+}
+table th:nth-of-type(2) {
+    width: 120px;
+}
+</style>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: center">Case</th>
+      <th style="text-align: center">Input</th>
+      <th style="text-align: center">num_thread_y, num_thread_x</th>
+      <th style="text-align: center">num_vthread_y, num_vthread_x</th>
+      <th style="text-align: center">TVM SAME pad (us)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: center">1</td>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">8, 8</td>
+      <td style="text-align: center">1, 1</td>
+      <td style="text-align: center">132.5</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">2</td>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">8, 8</td>
+      <td style="text-align: center">1, 4</td>
+      <td style="text-align: center">103.1</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">3</td>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">4, 32</td>
+      <td style="text-align: center">1, 1</td>
+      <td style="text-align: center">95.9</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">4</td>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">8, 16</td>
+      <td style="text-align: center">1, 2</td>
+      <td style="text-align: center">90.9</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>Case 2 is faster than case 1. It’s because in case 2 <code class="highlighter-rouge">num_thread_x=8</code> and <code class="highlighter-rouge">num_vthread_x=4</code> together ensures that consecutive threads access consecutive memory addresses,
+thus avoiding bank conflicts, as illustrated below (each color represents one thread’s workload):</p>
+
+<p style="text-align: center"><img src="/images/depthconv_tutorial/vthread_and_strided_pattern.png" alt="image" width="90%" /></p>
+
+<p>In theory case 3 and 4 should be the same fast, since they have the same workload per thread, and both enjoy efficient shared memory access. Somehow case 4 is just a little faster.</p>
+
+<p>Still remember tensorflow’s speed? It’s 251.6us, and now TVM is 2.8x faster. 387.4 -&gt; 132.5 -&gt; 95.9 -&gt; 90.9, blocking helps the most; tuning thread numbers saves 37us;
+vthread saves additional 5us.</p>
+
+<p>In fact, TVM can be extremely faster than tensorflow with large kernel size or channel_multiplier (because of more filter reuse) :</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: center">Input</th>
+      <th style="text-align: center">Filter</th>
+      <th style="text-align: center">stride</th>
+      <th style="text-align: center">tf-1.2 SAME pad (us)</th>
+      <th style="text-align: center">TVM SAME pad (us)</th>
+      <th style="text-align: center">How faster is TVM</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">[256, 1, 3, 3]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">251.6</td>
+      <td style="text-align: center">90.9</td>
+      <td style="text-align: center">2.8x</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">[256, 1, 5, 5]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">597.6</td>
+      <td style="text-align: center">128.9</td>
+      <td style="text-align: center">4.6x</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">[256, 2, 3, 3]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">659.9</td>
+      <td style="text-align: center">143.7</td>
+      <td style="text-align: center">4.6x</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">[1, 256, 96, 96]</td>
+      <td style="text-align: center">[256, 2, 5, 5]</td>
+      <td style="text-align: center">[1, 1]</td>
+      <td style="text-align: center">1203.9</td>
+      <td style="text-align: center">170.5</td>
+      <td style="text-align: center">7.1x</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="operator-fusion">Operator Fusion</h2>
+
+<p>One typical optimization we can do in deep learning is operator fusion, that computes multiple operators together in a single kernel without saving intermediate results back to global memory.
+TVM supports that out of the box.</p>
+
+<p>Consider a common pattern in neural networks: <code class="highlighter-rouge">depthwise_conv2d</code> + <code class="highlighter-rouge">scale_shift</code> + <code class="highlighter-rouge">relu</code>. We can fuse the three operators into one, by slightly modifying the original schedule:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DepthwiseConv2d</span> <span class="o">=</span> <span class="n">topi</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">depthwise_conv2d</span><span class="p">(</span><span class="n">Input</span><span class="p">,</span> <span class="n">Filter</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span [...]
+<span class="n">ScaleShift</span> <span class="o">=</span> <span class="n">topi</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">scale_shift</span><span class="p">(</span><span class="n">DepthwiseConv2d</span><span class="p">,</span> <span class="n">Scale</span><span class="p">,</span> <span class="n">Shift</span><span class="p">)</span>
+<span class="n">Relu</span> <span class="o">=</span> <span class="n">topi</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">ScaleShift</span><span class="p">)</span>
+
+<span class="n">Output</span> <span class="o">=</span> <span class="n">Relu</span> <span class="c1"># is no longer DepthwiseConv2d
+</span><span class="n">s</span><span class="p">[</span><span class="n">ScaleShift</span><span class="p">]</span><span class="o">.</span><span class="n">compute_inline</span><span class="p">()</span> <span class="c1"># this line fuses ScaleShift, explicitly
+</span><span class="n">s</span><span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">]</span><span class="o">.</span><span class="n">set_scope</span><span class="p">(</span><span class="s">"local"</span><span class="p">)</span> <span class="c1"># this line fuses DepthwiseConv2d, implicitly
+</span><span class="n">schedule</span><span class="p">(</span><span class="n">Output</span><span class="p">)</span> <span class="c1"># schedule for Output the same way we schedule for DepthwiseConv2d as discussed above
+</span><span class="n">s</span><span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">]</span><span class="o">.</span><span class="n">compute_at</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">],</span> <span class="n">tx</span><span class="p">)</span> <span class="c1"># tx is the inner most axis, bound to threadIdx.x
+</span></code></pre></div></div>
+
+<p>It generates IR like this:</p>
+
+<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
+<span class="n">produce</span> <span class="n">Relu</span> <span class="p">{</span>
+  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1</span>
+  <span class="c1">// attr [DepthwiseConv2d] storage_scope = "local"</span>
+  <span class="n">allocate</span> <span class="n">DepthwiseConv2d</span><span class="p">[</span><span class="n">float32</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">*</span> <span class="mi">4</span><span class="p">]</span>
+  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1</span>
+  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8</span>
+  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8</span>
+  <span class="n">produce</span> <span class="n">DepthwiseConv2d</span> <span class="p">{</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
+      <span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
+        <span class="n">DepthwiseConv2d</span><span class="p">[((</span><span class="n">i</span><span class="o">*</span><span class="mi">4</span><span class="p">)</span> <span class="o">+</span> <span class="n">j</span><span class="p">)]</span> <span class="o">=</span> <span class="mf">0.000000</span><span class="n">f</span>
+        <span class="k">for</span> <span class="p">(</span><span class="n">di</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
+          <span class="k">for</span> <span class="p">(</span><span class="n">dj</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
+            <span class="n">DepthwiseConv2d</span><span class="p">[((</span><span class="n">i</span><span class="o">*</span><span class="mi">4</span><span class="p">)</span> <span class="o">+</span> <span class="n">j</span><span class="p">)]</span> <span class="o">=</span> <span class="p">(</span><span class="n">DepthwiseConv2d</span><span class="p">[((</span><span class="n">i</span><span class="o">*</span><span class="mi">4</span><span class="p">)</span> <span class="o">+</span> <span c [...]
+          <span class="p">}</span>
+        <span class="p">}</span>
+      <span class="p">}</span>
+    <span class="p">}</span>
+  <span class="p">}</span>
+  <span class="k">for</span> <span class="p">(</span><span class="n">i2</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">i3</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
+      <span class="n">Relu</span><span class="p">[((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">8</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><span class="o"> [...]
+    <span class="p">}</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</code></pre></div></div>
+
+<p>As we can see, each thread computes <code class="highlighter-rouge">scale_shift</code> and <code class="highlighter-rouge">relu</code> before writing the result of <code class="highlighter-rouge">depthwise_conv2d</code> to global memory. The fused operator is as fast as single <code class="highlighter-rouge">depthwise_conv2d</code>.
+Below is the result with Input = [1, 256, 96, 96], Filter = [256, 1, 3, 3], stride = [1, 1], padding = ‘SAME’:</p>
+
+<ul>
+  <li>tf-1.2 <code class="highlighter-rouge">depthwise_conv2d</code>: 251.6 us</li>
+  <li>tf-1.2 <code class="highlighter-rouge">depthwise_conv2d</code> + <code class="highlighter-rouge">scale_shift</code> + <code class="highlighter-rouge">relu</code> (separate): 419.9 us</li>
+  <li>TVM <code class="highlighter-rouge">depthwise_conv2d</code>: 90.9 us</li>
+  <li>TVM <code class="highlighter-rouge">depthwise_conv2d + scale_shift + relu</code> (fused): 91.5 us</li>
+</ul>
+
+<p>The advantage of operator fusion is obvious.</p>
+
+<p>This is not the end, TVM can do operator fusion in a smarter way. You may refer to <a href="https://github.com/dmlc/tvm/issues/215">this</a> and read the source code provided below.</p>
+
+<h2 id="show-me-the-code">Show me the code</h2>
+<ul>
+  <li>Declare: <a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/convolution.py">https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/convolution.py</a></li>
+  <li>Schedule: <a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py">https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py</a></li>
+  <li>Test: <a href="https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py">https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py</a></li>
+</ul>
+
+<h2 id="acknowledgements">Acknowledgements</h2>
+<p>The author has many thanks to Tianqi Chen for his helpful advice and inspiring discussion.</p>
+
+<h2 id="bio">Bio</h2>
+<p><a href="https://Huyuwei.github.io">Yuwei Hu</a> is an intern in <a href="http://tusimple.ai/">Tusimple</a>’s HPC group.
+He is experiencing a gap year after obtaining a bachelor’s degree in electrical engineering from Beihang University.</p>
+
+<h2 id="references">References</h2>
+<p>[1] <a href="https://arxiv.org/abs/1610.02357">Xception: Deep Learning with Depthwise Separable Convolutions</a></p>
+
+<p>[2] <a href="https://arxiv.org/abs/1704.04861">MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications</a></p>
+
+<p>[3] <a href="http://norvig.com/21-days.html#answers">Approximate timing for various operations on a typical PC</a></p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2017/10/06/nnvm-compiler-announcement.html b/2017/10/06/nnvm-compiler-announcement.html
new file mode 100644
index 0000000..eeb55ee
--- /dev/null
+++ b/2017/10/06/nnvm-compiler-announcement.html
@@ -0,0 +1,235 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>NNVM Compiler: Open Compiler for AI Frameworks</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>NNVM Compiler: Open Compiler for AI Frameworks </h1>
+      <p class="post-meta">
+        <time datetime="2017-10-06T08:30:00-07:00" itemprop="datePublished">
+          Oct 6, 2017
+        </time>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p style="text-align: center">Paul G. Allen School of Computer Science &amp; Engineering, University of Washington</p>
+<p style="text-align: center">Amazon Web Service AI team</p>
+<p style="text-align: center">DMLC open-source community</p>
+
+<p>Deep learning has become ubiquitous and indispensable. We are seeing a rising need for deploying deep learning workloads on many kinds of platforms such as mobile phones, GPU, IoT devices and specialized accelerators.  Last month, we announced TVM stack to close the gap between deep learning frameworks, and the performance- or efficiency-oriented hardware backends.  TVM stack makes it easy to build an end to end compilation for a deep learning framework.  However, we think it would ev [...]
+
+<p>Today, UW Allen school and AWS AI team, together with other contributors, are excited to announce the release of NNVM compiler, an open deep learning compiler to compile front-end framework workloads directly to hardware backends. We build it using the two-level intermediate representation(IR) in the TVM stack.
+The reader is welcome to refer to the <a href="http://www.tvmlang.org/2017/08/17/tvm-release-announcement.html">original TVM announcement</a> for more technical details about TVM stack. With the help of TVM stack, NNVM compiler can:</p>
+
+<ul>
+  <li>Represent and optimize the common deep learning workloads in high level graph IR</li>
+  <li>Transform the computation graph to minimize memory utilization, optimize data layout and fuse computation patterns for different hardware backends.</li>
+  <li>Present an end to end compilation pipeline from front-end deep learning frameworks to bare metal hardwares.</li>
+</ul>
+
+<p style="text-align: center"><img src="/images/nnvm/nnvm_compiler_stack.png" alt="image" width="612px" /></p>
+
+<p>The NNVM compiler can directly take models from deep learning frameworks such as Apache MXNet.
+It also support model exchange formats such as ONNX and CoreML. ONNX support enables NNVM to compile deep learning models from PyTorch, Caffe2 and CNTK.
+The CoreML frontend enables deployment of CoreML models to non-iOS devices.</p>
+
+<p style="text-align: center"><img src="/images/nnvm/nnvm_compiler_code.png" alt="image" width="712px" /></p>
+
+<h2 id="separation-of-optimization-and-deployment">Separation of Optimization and Deployment</h2>
+
+<p style="text-align: center"><img src="/images/nnvm/nnvm_deploy.png" alt="image" width="512px" /></p>
+
+<p>NNVM compiler applies graph level and tensor level optimizations and jointly optimize them to get the best performance. We take a different approach from existing deep learning frameworks, which packages the graph optimization with the deployment runtime.  NNVM compiler adopts the conventional wisdom from compiler to separate the optimization from the actual deployment runtime. This approach offers substantial optimization but still keeps the runtime lightweight. The compiled module o [...]
+
+<h2 id="performance">Performance</h2>
+
+<p>NNVM compiler is still under active development, and we can expect more improvements to come, but we have started to see promising results.
+We benchmarked its performance and compared it against Apache MXNet on two typical hardware configurations: ARM CPU on Raspberry PI and Nvidia GPU on AWS. Despite the radical architecture difference between these two chips, we can use the same infrastructure and only need to change the schedule for each type of hardware.</p>
+
+<h3 id="nvidia-gpu">Nvidia GPU</h3>
+
+<p>GPU benchmarks and schedules are contributed by Leyuan Wang (AWS/UCDavis) and Yuwei Hu (TuSimple). We compared the NNVM compiler against Apache MXNet with CUDA8 and cuDNN7 as the backend on Nvidia K80. This is a very strong baseline, as Apache MXNet turns on auto-tuning to select the best kernel from CuDNN. We also used the optimized depthwise kernel in MXNet to optimize MobileNet workload.</p>
+
+<p style="text-align: center"><img src="/images/nnvm/nnvm_k80_result.png" alt="image" width="400px" /></p>
+
+<p>As can be seen, NNVM compiler generate code that outperforms Apache MXNet on K80. These improvements are due to the joint graph level and kernel level optimizations. It is worth noting that NNVM compiler generates all the optimized GPU kernels on its own without relying on external libraries like CuDNN.</p>
+
+<h3 id="raspberry-pi-3b">Raspberry Pi 3b</h3>
+
+<p>The Rasberry Pi compilation stack is contributed by Ziheng Jiang(AWS/FDU).
+We compared NNVM compiler against Apache MXNet with OpenBLAS and NNPack.
+We explored the setups to get the best performance out of MXNet: we turned on Winograd convolution in the NNPACK for 3x3 convolutions, enabled multi-threading and disabled the additional scheduler thread (so all threads are used by NNPack).</p>
+
+<p style="text-align: center"><img src="/images/nnvm/nnvm_rasp_result.png" alt="image" width="400px" /></p>
+
+<p>As can be seen, the code generated by NNVM compiler is two times faster on ResNet18.
+The gap on MobileNet is mainly due to lack of depthwise convolution in existing CPU DNN libraries. NNVM compiler takes benefit of direct generating efficient ARM code directly.</p>
+
+<h2 id="acknowledgement">Acknowledgement</h2>
+<p>This project wouldn’t become possible without our early contributors in the DMLC community.
+We would like to specially thank Yuwei Hu(TuSimple), Leyuan Wang(AWS/UCDavis), Joshua Z. Zhang(AWS)
+and Xingjian Shi(HKUST) for their early contributions to the project. We would also like to thank all the contributors
+to the TVM stack.</p>
+
+<p>We also learnt a lot from the following projects when building NNVM Compiler.</p>
+<ul>
+  <li><a href="https://github.com/Theano/Theano">Theano</a>: possibly the earliest compiler for deep learning</li>
+  <li><a href="https://github.com/halide/Halide">Halide</a>: TVM uses <a href="https://github.com/dmlc/HalideIR">HalideIR</a> as data structure for
+arithematic simplification and low level lowering. HalideIR is derived from Halide.
+We also learns from Halide when implementing the lowering pipeline in TVM.</li>
+  <li><a href="https://github.com/inducer/loopy">Loopy</a>: use of integer set analysis and its loop transformation primitives.</li>
+</ul>
+
+<h2 id="links">Links</h2>
+<ul>
+  <li>Github page of NNVM Compiler: <a href="https://github.com/dmlc/nnvm">https://github.com/dmlc/nnvm</a></li>
+  <li>Github page of TVM: <a href="https://github.com/dmlc/tvm">https://github.com/dmlc/tvm</a></li>
+  <li><a href="https://news.cs.washington.edu/2017/10/06/allen-school-and-aws-team-up-on-new-nnvm-compiler-for-deep-learning-frameworks/">UW Allen school blog about NNVM compiler</a></li>
+  <li><a href="https://aws.amazon.com/blogs/ai/introducing-nnvm-compiler-a-new-open-end-to-end-compiler-for-ai-frameworks/">AWS blogpost about NNVM compiler</a></li>
+</ul>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
new file mode 100644
index 0000000..d0a36c0
--- /dev/null
+++ b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
@@ -0,0 +1,378 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm </h1>
+      <p class="post-meta">
+        <time datetime="2017-10-30T00:00:00-07:00" itemprop="datePublished">
+          Oct 30, 2017
+        </time>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p style="text-align: center">Aditya Atluri, Advanced Micro Devices, Inc.</p>
+<p style="text-align: center">Masahiro Masuda, Ziosoft, Inc.</p>
+
+<p>We are pleased to announce a new GPU backend for TVM stack - ROCm backend for AMD GPUs. If you are not familiar with TVM, you can refer to <a href="http://tvmlang.org/2017/08/17/tvm-release-announcement.html">the earlier announcement</a> first. In short, TVM stack is an end to end compilation stack to deploy deep learning workloads to all hardware backends. Today’s announcement focuses on the code generator support for AMD GPUs. Specifically, we developed a new GPU code generator for  [...]
+
+<p style="text-align: center"><img src="/images/rocm/tvm_rocm_overview.png" alt="image" width="90%" /></p>
+
+<p>TVM stack is developed by an open source community under Apache-2.0 License. ROCm backend support is done with the help from community. Aditya first implemented codegen and runtime. He was later joined by Masahiro. Masahiro’s full time job is not related to TVM or AMD GPUs. Nonetheless, TVM got him excited and he has been involved in fixing bugs, resolving all failing unittests, and adding math function support to codegen.</p>
+
+<h2 id="rocm-stack">ROCm stack</h2>
+
+<p>Radeon Open Compute is open-source initiative by AMD to leverage compute power of current and future generation GPUs. ROCm software stack is a great tool to express and run most commonly used GPU programming models and achieve peak performance. Not only ROCm is an open-source stack, it is an open stack, which means all the ISA and hardware features are well documented and programmable by developers. Developers can experiment with different programming models and try out multiple ways  [...]
+
+<p>TVM leverages the open-source feature of ROCm stack by using LLVM AMDGPU backend code generator. TVM translates from its intermediate representation (IR) to LLVM intermediate representation. This is the place where ROCm stack open-source feature takes control. TVM’s LLVM AMDGPU CodeGen pass converts LLVM IR into GPU assembly and object code, which is later called to run the whole network or group of layers or single layer.</p>
+
+<p>On ROCm stack, there is no virtual ISA, you get what you ask for not less not more. Hence, one can schedule operations in a kernel at a granularity of a single instruction, without worrying about instruction reordering and other optimizations you do not ask for.</p>
+
+<h2 id="using-nnvm-compiler-with-rocm-backend">Using NNVM Compiler with ROCm backend</h2>
+
+<p>Thanks to TVM stack, we can directly compile models from popular deep learning frameworks such as MXNet and PyTorch into AMD GPU assembly using NNVM compiler, today. With ROCm backend, the generic workflow becomes as follows.</p>
+
+<p style="text-align: center"><img src="/images/rocm/rocm_workflow.png" alt="image" width="90%" /></p>
+
+<p>We have put together working examples of compiling models from MXNet and PyTorch with NNVM, and running them on AMD GPUs with ROCm backend. More frameworks are supported via the NNVM compiler stack. The repository is available <a href="https://github.com/ROCmSoftwarePlatform/nnvm-rocm">here</a>.</p>
+
+<p>The script <a href="https://github.com/ROCmSoftwarePlatform/nnvm-rocm/blob/master/mxnet_imagenet_inference.py">mxnet_imagenet_inference.py</a> demonstrates Imagenet inference on AMD GPUs with recently introduced MXNet-Gluon model. It does the following:</p>
+
+<ul>
+  <li>Loads Resnet 50 model from <a href="https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html">the Gluon model zoo</a></li>
+  <li>Converts Gluon Resnet 50 model to NNVM graph format, using <code class="highlighter-rouge">nnvm.frontend.from_mxnet (...)</code></li>
+  <li>Compiles and executes the graph with ROCm backend</li>
+</ul>
+
+<p>The example comes with an image of the following cat.</p>
+
+<p style="text-align: center"><img src="/images/rocm/cat.png" alt="image" /></p>
+
+<p>Running our network, it predicts this image as “tigar cat”, among 1000 categories.</p>
+
+<figure class="highlight"><pre><code class="language-plain" data-lang="plain">$ python mxnet_imagenet_inference.py
+Testing model resnet50_v1
+x (1, 3, 224, 224)
+TVM prediction top-1: 282 tiger cat</code></pre></figure>
+
+<p>The script <a href="https://github.com/ROCmSoftwarePlatform/nnvm-rocm/blob/master/advanced_superres_onnx.py">advanced_superres_onnx.py</a> gives an example of loading a model trained with PyTorch. The model is stored in the <a href="https://onnx.ai/">ONNX</a> format. In this example, our network takes an low resolution image as input, and outputs a 4x high resolution image. We refer the details of a problem setup and the network architecture to <a href="https://arxiv.org/abs/1609.0480 [...]
+
+<p>In order to use models in the ONNX format with NNVM, we first use <a href="https://github.com/onnx/onnx">the ONNX library</a> to load the ONNX model into the Protocol buffer object. We can then use <code class="highlighter-rouge">nnvm.frontend.from_onnx(...)</code> to obtain an equivalent NNVM graph. With a NNVM graph in hand, we can follow the generic workflow of compilation and graph execution outlined above.</p>
+
+<p style="text-align: center"><img src="/images/rocm/butterfly.png" alt="image" /></p>
+
+<p>The input to the network is a 64 x 64 image on the left, and it outputs a 256 x 256 image on the right. On the middle is a 256 x 256 image obtained simply by resizing the input image with bicubic interpolation. The network outputs an image of far better quality.</p>
+
+<p>The input images are taken from the original paper, and they are available <a href="https://twitter.app.box.com/s/lcue6vlrd01ljkdtdkhmfvk7vtjhetog">here</a>.</p>
+
+<h2 id="a-note-on-performance">A Note on performance</h2>
+
+<p>The current support on ROCm focuses on the functionality coverage. We have already seen promising performance results by simply adopting existing TVM schedules for CUDA backend. For example, you can try running <a href="https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py">the gemm test script</a> in the TVM repository and see the result. For two types of cards we tested, the current gemm recipe for square matrix multiplication (not yet specifically optimized f [...]
+This is already a promising start, as it is very hard to optimize performance to get to peak and we
+did not yet apply AMD GPU specific optimizations.
+We are starting to look at performance optimization and we expect more improvement to come.</p>
+
+<h2 id="walkthrough-of-rocm-backend">Walkthrough of ROCm backend</h2>
+
+<p>In the following part of this article we focus on explaining how to use ROCm backend when working with TVM directly. All you need to do is to build your TVM function under the target “rocm” and create a runtime context for it. Here, we show an example of ROCm backend usage, following ‘Vector Add Example’ in TVM’s <a href="http://docs.tvmlang.org/tutorials/get_started.html#vector-add-example">getting started tutorial</a>.</p>
+
+<p>We start by setting up a compute operation and a schedule for the vector add kernel. This step is independent of a backend.</p>
+
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">absolute_import</span><span class="p">,</span> <span class="n">print_function</span>
+<span class="kn">import</span> <span class="nn">tvm</span>
+<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
+
+<span class="n">n</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s">"n"</span><span class="p">)</span>
+<span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
+<span class="n">B</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'B'</span><span class="p">)</span>
+<span class="n">C</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">A</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">B</span><span class= [...]
+<span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
+<span class="n">bx</span><span class="p">,</span> <span class="n">tx</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">fact [...]
+<span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">bx</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">))</span>
+<span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">))</span></code></pre></figure>
+
+<p>Next, to use ROCm backend we build our kernel under “rocm” target. This will cause TVM to use our new code generator. We also need a runtime context for ROCm backend.</p>
+
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">target</span> <span class="o">=</span> <span class="s">"rocm"</span>
+<span class="n">fadd_rocm</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span> <span class="n">target</span><span class="p">,</span> <span class="n">target_host</span><span class="o">=</span [...]
+<span class="n">ctx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">rocm</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span></code></pre></figure>
+
+<p>After building the kernel and setting up a runtime context, we can launch our vector add kernel.</p>
+
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">n</span> <span class="o">=</span> <span class="mi">1024</span>
+<span class="n">a</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span><span class="o">.</span><span class= [...]
+<span class="n">b</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span><span class="o">.</span><span class= [...]
+<span class="n">c</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">C</span><span class="o">.</span><span class="n">dtype</span><span class=" [...]
+
+<span class="n">fadd_rocm</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span>
+<span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">(),</span> <span class="n">a</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">()</span> <span class="o">+</span> <span class="n">b</span><span class="o">.</span><span class="n">asnumpy</span><span class="p" [...]
+
+<p>We can view LLVM IR that TVM generates in the following way:</p>
+
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">dev_module</span> <span class="o">=</span> <span class="n">fadd_rocm</span><span class="o">.</span><span class="n">imported_modules</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
+<span class="k">print</span><span class="p">(</span><span class="n">dev_module</span><span class="o">.</span><span class="n">get_source</span><span class="p">(</span><span class="s">"llvm"</span><span class="p">))</span></code></pre></figure>
+
+<p>You should see something like this:</p>
+
+<figure class="highlight"><pre><code class="language-llvm" data-lang="llvm"><span class="c1">; ModuleID = 'myadd__kernel0'</span>
+<span class="err">sour</span><span class="k">c</span><span class="err">e_filename</span> <span class="p">=</span> <span class="s">"myadd__kernel0"</span>
+<span class="k">target</span> <span class="k">datalayout</span> <span class="p">=</span> <span class="s">"e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"</span>
+<span class="k">target</span> <span class="k">triple</span> <span class="p">=</span> <span class="s">"amdgcn-amd-amdhsa-hcc"</span>
+
+
+<span class="c1">; Function Attrs: nounwind</span>
+<span class="k">define</span> <span class="k">dllexport</span> <span class="err">amdgpu_ker</span><span class="k">ne</span><span class="err">l</span> <span class="kt">void</span> <span class="vg">@myadd__kernel0</span><span class="p">(</span><span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="k">noalias</span> <span clas [...]
+<span class="nl">entry:</span>
+  <span class="nv">%4</span> <span class="p">=</span> <span class="k">tail</span> <span class="k">call</span> <span class="kt">i32</span> <span class="vg">@llvm.amdgcn.workgroup.id.x</span><span class="p">()</span>
+  <span class="nv">%5</span> <span class="p">=</span> <span class="k">tail</span> <span class="k">call</span> <span class="kt">i32</span> <span class="vg">@llvm.amdgcn.workitem.id.x</span><span class="p">()</span>
+  <span class="nv">%6</span> <span class="p">=</span> <span class="k">add</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%3</span><span class="p">,</span> <span class="m">-127</span>
+  <span class="nv">%7</span> <span class="p">=</span> <span class="k">ashr</span> <span class="kt">i32</span> <span class="nv">%6</span><span class="p">,</span> <span class="m">6</span>
+  <span class="nv">%8</span> <span class="p">=</span> <span class="k">icmp</span> <span class="k">slt</span> <span class="kt">i32</span> <span class="nv">%4</span><span class="p">,</span> <span class="nv">%7</span>
+  <span class="k">br</span> <span class="kt">i1</span> <span class="nv">%8</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%if_then</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%if_else</span>
+
+
+<span class="nl">if_then:</span>                                          <span class="c1">; preds = %entry</span>
+  <span class="nv">%9</span> <span class="p">=</span> <span class="k">shl</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%4</span><span class="p">,</span> <span class="m">6</span>
+  <span class="k">br</span> <span class="kt">label</span> <span class="nv">%if_end.sink.split</span>
+
+
+<span class="nl">if_end.sink.split:</span>                                <span class="c1">; preds = %if_else, %if_then</span>
+  <span class="nv">%.pre-phi</span> <span class="p">=</span> <span class="k">phi</span> <span class="kt">i32</span> <span class="p">[</span> <span class="nv">%21</span><span class="p">,</span> <span class="nv">%if_else</span> <span class="p">],</span> <span class="p">[</span> <span class="nv">%9</span><span class="p">,</span> <span class="nv">%if_then</span> <span class="p">]</span>
+  <span class="nv">%10</span> <span class="p">=</span> <span class="k">add</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%.pre-phi</span><span class="p">,</span> <span class="nv">%5</span>
+  <span class="nv">%11</span> <span class="p">=</span> <span class="k">add</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%.pre-phi</span><span class="p">,</span> <span class="nv">%5</span>
+  <span class="nv">%12</span> <span class="p">=</span> <span class="k">sext</span> <span class="kt">i32</span> <span class="nv">%11</span> <span class="k">to</span> <span class="kt">i64</span>
+  <span class="nv">%13</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%2</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
+  <span class="nv">%14</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%13</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv" [...]
+  <span class="nv">%15</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%1</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
+  <span class="nv">%16</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%15</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv" [...]
+  <span class="nv">%17</span> <span class="p">=</span> <span class="k">fadd</span> <span class="kt">float</span> <span class="nv">%14</span><span class="p">,</span> <span class="nv">%16</span>
+  <span class="nv">%18</span> <span class="p">=</span> <span class="k">sext</span> <span class="kt">i32</span> <span class="nv">%10</span> <span class="k">to</span> <span class="kt">i64</span>
+  <span class="nv">%19</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%0</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%18</span>
+  <span class="k">store</span> <span class="kt">float</span> <span class="nv">%17</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%19</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span clas [...]
+  <span class="k">br</span> <span class="kt">label</span> <span class="nv">%if_end</span>
+
+
+<span class="nl">if_end:</span>                                           <span class="c1">; preds = %if_end.sink.split, %if_else</span>
+  <span class="k">ret</span> <span class="kt">void</span>
+
+
+<span class="nl">if_else:</span>                                          <span class="c1">; preds = %entry</span>
+  <span class="nv">%20</span> <span class="p">=</span> <span class="k">sub</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%3</span><span class="p">,</span> <span class="nv">%5</span>
+  <span class="nv">%21</span> <span class="p">=</span> <span class="k">shl</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%4</span><span class="p">,</span> <span class="m">6</span>
+  <span class="nv">%22</span> <span class="p">=</span> <span class="k">icmp</span> <span class="k">slt</span> <span class="kt">i32</span> <span class="nv">%21</span><span class="p">,</span> <span class="nv">%20</span>
+  <span class="k">br</span> <span class="kt">i1</span> <span class="nv">%22</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%if_end.sink.split</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%if_end</span><span class="p">,</span> <span class="nv">!prof</span> <span class="nv">!12</span>
+<span class="p">}</span></code></pre></figure>
+
+<p>We can also view GPU assembly that ROCm backend generates. This is the real code that runs on your GPU.</p>
+
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">print</span><span class="p">(</span><span class="n">dev_module</span><span class="o">.</span><span class="n">get_source</span><span class="p">(</span><span class="s">"asm"</span><span class="p">))</span></code></pre></figure>
+
+<p>The assembly should look something like this, omitting unnecessary details:</p>
+
+<figure class="highlight"><pre><code class="language-plain" data-lang="plain">        s_load_dword s1, s[4:5], 0x18
+        v_mov_b32_e32 v2, -1
+        v_mov_b32_e32 v1, 0
+        s_waitcnt lgkmcnt(0)
+        s_add_i32 s0, s1, 0xffffff81
+        s_ashr_i32 s0, s0, 6
+        s_cmp_ge_i32 s6, s0
+        s_cbranch_scc0 BB0_2
+        v_sub_i32_e32 v1, vcc, s1, v0
+        s_lshl_b32 s0, s6, 6
+        v_cmp_lt_i32_e32 vcc, s0, v1
+        v_mov_b32_e32 v2, 0
+        v_cndmask_b32_e64 v1, 0, -1, vcc
+BB0_2:
+        v_cmp_ne_u32_e32 vcc, 0, v2
+        v_cndmask_b32_e64 v2, 0, 1, vcc
+        v_cmp_ne_u32_e32 vcc, 1, v2
+        s_and_b64 vcc, exec, vcc
+        s_cbranch_vccnz BB0_4
+        s_lshl_b32 s0, s6, 6
+        v_mov_b32_e32 v1, -1
+BB0_4:
+        v_cmp_ne_u32_e32 vcc, 0, v1
+        v_mov_b32_e32 v1, s0
+        s_and_saveexec_b64 s[0:1], vcc
+        s_xor_b64 s[0:1], exec, s[0:1]
+        s_cbranch_execz BB0_6
+BB0_5:
+        s_load_dwordx2 s[2:3], s[4:5], 0x0
+        s_load_dwordx2 s[6:7], s[4:5], 0x8
+        v_add_i32_e32 v0, vcc, v1, v0
+        s_load_dwordx2 s[4:5], s[4:5], 0x10
+        v_ashrrev_i32_e32 v1, 31, v0
+        v_lshlrev_b64 v[0:1], 2, v[0:1]
+        s_waitcnt lgkmcnt(0)
+        v_add_i32_e32 v2, vcc, s4, v0
+        v_mov_b32_e32 v3, s5
+        v_addc_u32_e32 v3, vcc, v3, v1, vcc
+        flat_load_dword v2, v[2:3]
+        v_add_i32_e32 v4, vcc, s6, v0
+        v_mov_b32_e32 v3, s7
+        v_addc_u32_e32 v5, vcc, v3, v1, vcc
+        flat_load_dword v4, v[4:5]
+        v_mov_b32_e32 v3, s3
+        v_add_i32_e32 v0, vcc, s2, v0
+        v_addc_u32_e32 v1, vcc, v3, v1, vcc
+        s_waitcnt vmcnt(0) lgkmcnt(0)
+        v_add_f32_e32 v2, v2, v4
+        flat_store_dword v[0:1], v2
+BB0_6:
+        s_or_b64 exec, exec, s[0:1]
+        s_endpgm</code></pre></figure>
+
+<p>Links</p>
+
+<ul>
+  <li>Github page of NNVM Compiler: <a href="https://github.com/dmlc/nnvm">https://github.com/dmlc/nnvm</a></li>
+  <li>Github page of TVM: <a href="https://github.com/dmlc/tvm">https://github.com/dmlc/tvm</a></li>
+  <li>Examples of ROCm backend with NNVM: <a href="https://github.com/ROCmSoftwarePlatform/nnvm-rocm">https://github.com/ROCmSoftwarePlatform/nnvm-rocm</a></li>
+</ul>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2017/11/08/android-rpc-introduction.html b/2017/11/08/android-rpc-introduction.html
new file mode 100644
index 0000000..5eae692
--- /dev/null
+++ b/2017/11/08/android-rpc-introduction.html
@@ -0,0 +1,384 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Remote Profile and Test Deep Learning Cross Compilation on Mobile Phones with TVM RPC</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Remote Profile and Test Deep Learning Cross Compilation on Mobile Phones with TVM RPC </h1>
+      <p class="post-meta">
+        <time datetime="2017-11-08T00:00:00-08:00" itemprop="datePublished">
+          Nov 8, 2017
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Yizhi Liu</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>TVM stack is an end to end compilation stack to deploy deep learning workloads to all hardware backends.
+Thanks to the NNVM compiler support of TVM stack, we can now directly compile descriptions from deep learning frameworks and compile them to bare metal code.
+An impressive feature OF tvm is its ability to deploy computation workloads on different platforms, such as GPUs and mobile phones (will support more hardward backends).</p>
+
+<p>However, when we want to test and profile cross compilation, it is hard to test different computation workloads on a heterogeneous device such as raspberry pi or a mobile phone.
+In order to optimize a computation task, one has to edit the code on the development PC, compile, deploy to the device, test, then modify the codes again to see whether it accelerates. The workflow looks like,</p>
+
+<p style="text-align: center"><img src="/images/android_rpc/flow1.png" alt="image" width="50%" /></p>
+
+<p>Is there any way to speed up this process?</p>
+
+<p>Today we introduce an approach to deploy and test TVM workloads on Android Phones. We develop a TVM runtime for Java and build an Android APP upon it. The Android APP takes shared library as input and runs compiled functions on the mobile phone. Thus our workflow simplifies to,</p>
+
+<p style="text-align: center"><img src="/images/android_rpc/flow2.png" alt="image" width="50%" /></p>
+
+<p>With the help of the TVM RPC, one can build TVM functions and NDArrays on a remote device. The ability to cross-compile to different platforms makes it easy to develop on one platform and test on another.</p>
+
+<p>The process is illustrated as following:</p>
+
+<p style="text-align: center"><img src="/images/android_rpc/arch.png" alt="image" width="70%" /></p>
+
+<h2 id="run-tvm-app-on-android-phone">Run TVM APP on Android Phone</h2>
+
+<p>You can find Android RPC APP in <a href="https://github.com/dmlc/tvm/tree/master/apps/android_rpc">apps/android_rpc</a>. Please follow the instruction to build for your Android device. Once the APK is built, sign it using <code class="highlighter-rouge">apps/android_rpc/dev_tools</code> and install it on the phone. The APP looks like:</p>
+
+<p style="text-align: center"><img src="/images/android_rpc/app.png" alt="image" width="25%" />
+<img src="/images/android_rpc/app_error.png" alt="image" width="25%" /></p>
+
+<p>Usually we cannot start a standalone server on mobile phone, instead we start an proxy server and use our app to connect.</p>
+
+<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python <span class="nt">-m</span> tvm.exec.rpc_proxy
+</code></pre></div></div>
+
+<h2 id="create-ndarray-on-the-phone">Create NDArray on the Phone</h2>
+
+<p>Now we can connect to the proxy server from the laptop:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tvm.contrib</span> <span class="kn">import</span> <span class="n">rpc</span>
+<span class="n">remote</span> <span class="o">=</span> <span class="n">rpc</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="s">"0.0.0.0"</span><span class="p">,</span> <span class="mi">9090</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="s">"android"</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>This will give us a handler <code class="highlighter-rouge">remote</code> which we can use to communicate with the mobile phone. For instance, the following lines create a 1024x1024 matrix on phone’s GPU:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span>
+	<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">1024</span><span class="p">,</span> <span class="mi">1024</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">dtype</span><span class="p">),</span>
+	<span class="n">ctx</span> <span class="o">=</span> <span class="n">remote</span><span class="o">.</span><span class="n">cl</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
+</code></pre></div></div>
+
+<p>When <code class="highlighter-rouge">A.asnumpy()</code> is called from the laptop, the matrix <code class="highlighter-rouge">A </code>will be copied to phone’s RAM and then transfer to the laptop through the proxy server. The TVM RPC interface is transparent to users.</p>
+
+<h2 id="gemm-matrix-multiplication-on-the-phone">GEMM (Matrix Multiplication) on the Phone</h2>
+
+<p>Now we are going to introduce how to test matrix multiplication on an Android phone. First let’s define the very simple GEMM schedule:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tvm</span>
+<span class="k">def</span> <span class="nf">gemm</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">bn</span><span class="p">):</span>
+    <span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
+    <span class="n">B</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'B'</span><span class="p">)</span>
+    <span class="n">k</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'k'</span><span class="p">)</span>
+
+    <span class="n">C</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span>
+        <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span>
+        <span class="k">lambda</span> <span class="n">ii</span><span class="p">,</span> <span class="n">jj</span><span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">ii</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">k</span><span cla [...]
+        <span class="n">name</span><span class="o">=</span><span class="s">'C'</span><span class="p">)</span>
+
+    <span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
+
+    <span class="n">block_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">)</span>
+    <span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">)</span>
+
+    <span class="n">bo</span><span class="p">,</span> <span class="n">bi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n"> [...]
+    <span class="n">to</span><span class="p">,</span> <span class="n">ti</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n"> [...]
+    <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">bi</span><span class="p">,</span> <span class="n">block_x</span><span class="p">)</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">ti</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
+
+    <span class="k">print</span><span class="p">(</span><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span> <span class="n">simple_mode</span><span class="o">=</span><span class="bp">True</span><span class="p">))</span>
+
+    <span class="k">return</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span>
+    	<span class="s">"opencl"</span><span class="p">,</span>
+    	<span class="n">target_host</span><span class="o">=</span><span class="s">"llvm -target=arm64-linux-android"</span><span class="p">,</span>
+    	<span class="n">name</span><span class="o">=</span><span class="s">"gemm_gpu"</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>There’s nothing special except the last line. Here we set the target to ‘opencl’ since this is the computation language which our Mali GPU supports. Note that we set <code class="highlighter-rouge">target_host</code> to ‘<code class="highlighter-rouge">llvm -target=arm64-linux-android</code>’, it depends on what architecture your Android Phone is. We tested on Samsung Galaxy S6 Edge, which has a Mali-T760 GPU. Here is the CPU info for this phone,</p>
+
+<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>adb shell
+shell@zenltechn:/ <span class="nv">$ </span><span class="nb">cat</span> /proc/cpuinfo
+Processor	: AArch64 Processor rev 2 <span class="o">(</span>aarch64<span class="o">)</span>
+processor	: 0
+processor	: 1
+processor	: 2
+processor	: 3
+processor	: 4
+processor	: 5
+processor	: 6
+processor	: 7
+Features	: fp asimd aes pmull sha1 sha2 crc32
+CPU implementer	: 0x41
+CPU architecture: AArch64
+CPU variant	: 0x0
+CPU part	: 0xd03
+CPU revision	: 2
+
+Hardware	: SAMSUNG Exynos7420
+</code></pre></div></div>
+
+<p>Please refer to <a href="https://clang.llvm.org/docs/CrossCompilation.html#target-triple">target triple</a> to learn the compile options for LLVM.</p>
+
+<p>We use <code class="highlighter-rouge">tvm.contrib.ndk</code> to build the shared library for the Android system,</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tvm.contrib</span> <span class="kn">import</span> <span class="n">rpc</span><span class="p">,</span> <span class="n">util</span><span class="p">,</span> <span class="n">ndk</span>
+<span class="n">N</span> <span class="o">=</span> <span class="mi">1024</span>
+<span class="n">f</span> <span class="o">=</span> <span class="n">gemm</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">bn</span> <span class="o">=</span> <span class="mi">256</span><span class="p">)</span>
+<span class="n">temp</span> <span class="o">=</span> <span class="n">util</span><span class="o">.</span><span class="n">tempdir</span><span class="p">()</span>
+<span class="n">path_dso</span> <span class="o">=</span> <span class="n">temp</span><span class="o">.</span><span class="n">relpath</span><span class="p">(</span><span class="s">"gemm_gpu.so"</span><span class="p">)</span>
+<span class="n">f</span><span class="o">.</span><span class="n">export_library</span><span class="p">(</span><span class="n">path_dso</span><span class="p">,</span> <span class="n">ndk</span><span class="o">.</span><span class="n">create_shared</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p><code class="highlighter-rouge">ndk.create_shared</code> reads the environment variable <code class="highlighter-rouge">TVM_NDK_CC</code> to find the compiler &amp; linker for the Android device. We can easily use NDK to generate standalone toolchain for our device. For example, the following commands generate standalone compilers and linkers for ARM64 Android devices.</p>
+
+<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /opt/android-ndk/build/tools/
+./make-standalone-toolchain.sh <span class="nt">--platform</span><span class="o">=</span>android-24 <span class="nt">--use-llvm</span> <span class="nt">--arch</span><span class="o">=</span>arm64 <span class="nt">--install-dir</span><span class="o">=</span>/opt/android-toolchain-arm64
+</code></pre></div></div>
+
+<p>If everything goes right, we’ve got a shared library ‘gemm_gpu.so’. Now let’s upload it to the mobile phone, make the phone load the module and get a remote handler,</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">remote</span> <span class="o">=</span> <span class="n">rpc</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="s">"0.0.0.0"</span><span class="p">,</span> <span class="mi">9090</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="s">"android"</span><span class="p">)</span>
+
+<span class="n">remote</span><span class="o">.</span><span class="n">upload</span><span class="p">(</span><span class="n">path_dso</span><span class="p">)</span>
+<span class="n">f</span> <span class="o">=</span> <span class="n">remote</span><span class="o">.</span><span class="n">load_module</span><span class="p">(</span><span class="s">"gemm_gpu.so"</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>Create the remote arrays and print the running time,</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ctx</span> <span class="o">=</span> <span class="n">remote</span><span class="o">.</span><span class="n">cl</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
+
+<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
+<span class="n">a_np</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span>< [...]
+<span class="n">b_np</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span>< [...]
+
+<span class="n">a</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">a_np</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
+<span class="n">b</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">b_np</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
+<span class="n">c</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"float32"</span><span  [...]
+
+<span class="n">time_f</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">time_evaluator</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">entry_name</span><span class="p">,</span> <span class="n">ctx</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
+<span class="n">cost</span> <span class="o">=</span> <span class="n">time_f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span>
+<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="si">%</span><span class="s">g secs/op, </span><span class="si">%</span><span class="s">g GFLOPS'</span> <span class="o">%</span> <span class="p">(</span><span class="n">cost</span><span class="p">,</span> <span class="n">ngflops</span><span class="p">(</span><span class="n">N</span><span class="p">)</span> <span class="o">/</span> <span class="n">cost</span><span class="p">))</span>
+</code></pre></div></div>
+
+<p>Now we can verify the results on PC,</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_almost_equal</span><span class="p">(</span>
+	<span class="n">c</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">(),</span>
+	<span class="n">a_np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">b_np</span><span class="p">),</span>
+	<span class="n">decimal</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>In the case above, we develop and cross-compile to a binary file for our mobile phone. Through the proxy server, the binary is uploaded to the phone and run in its JVM. This approach makes it easy to develop and test different computation workloads on Android.</p>
+
+<h2 id="java-runtime-for-tvm">Java Runtime for TVM</h2>
+
+<p>The Android APP is built on top of the Java runtime, which provides minimum supports for TVM Function and NDArray. Here’s an example for registering function in tvm4j,</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Function</span> <span class="n">func</span> <span class="o">=</span> <span class="nc">Function</span><span class="o">.</span><span class="na">convertFunc</span><span class="o">(</span><span class="k">new</span> <span class="nc">Function</span><span class="o">.</span><span class="na">Callback</span><span class="o">()</span> <span class="o">{</span>
+      <span class="nd">@Override</span> <span class="kd">public</span> <span class="nc">Object</span> <span class="nf">invoke</span><span class="o">(</span><span class="nc">TVMValue</span><span class="o">...</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+        <span class="nc">StringBuilder</span> <span class="n">res</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">StringBuilder</span><span class="o">();</span>
+        <span class="k">for</span> <span class="o">(</span><span class="nc">TVMValue</span> <span class="n">arg</span> <span class="o">:</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+          <span class="n">res</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">arg</span><span class="o">.</span><span class="na">asString</span><span class="o">());</span>
+        <span class="o">}</span>
+        <span class="k">return</span> <span class="n">res</span><span class="o">.</span><span class="na">toString</span><span class="o">();</span>
+      <span class="o">}</span>
+    <span class="o">});</span>
+<span class="nc">TVMValue</span> <span class="n">res</span> <span class="o">=</span> <span class="n">func</span><span class="o">.</span><span class="na">pushArg</span><span class="o">(</span><span class="s">"Hello"</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="s">" "</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="s">"World!"</span><span class="o">).</span><span class="na">invoke</span [...]
+<span class="n">assertEquals</span><span class="o">(</span><span class="s">"Hello World!"</span><span class="o">,</span> <span class="n">res</span><span class="o">.</span><span class="na">asString</span><span class="o">());</span>
+<span class="n">res</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
+<span class="n">func</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
+</code></pre></div></div>
+
+<p>As we have seen in the GEMM part, one can build shared library by Python and execute it by Java,</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">ml.dmlc.tvm.Module</span><span class="o">;</span>
+<span class="kn">import</span> <span class="nn">ml.dmlc.tvm.NDArray</span><span class="o">;</span>
+<span class="kn">import</span> <span class="nn">ml.dmlc.tvm.TVMContext</span><span class="o">;</span>
+
+<span class="kn">import</span> <span class="nn">java.io.File</span><span class="o">;</span>
+<span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+
+<span class="kd">public</span> <span class="kd">class</span> <span class="nc">LoadAddFunc</span> <span class="o">{</span>
+  <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="nc">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+    <span class="nc">String</span> <span class="n">loadingDir</span> <span class="o">=</span> <span class="n">args</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
+    <span class="nc">Module</span> <span class="n">fadd</span> <span class="o">=</span> <span class="nc">Module</span><span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="n">loadingDir</span> <span class="o">+</span> <span class="nc">File</span><span class="o">.</span><span class="na">separator</span> <span class="o">+</span> <span class="s">"add_cpu.so"</span><span class="o">);</span>
+
+    <span class="nc">TVMContext</span> <span class="n">ctx</span> <span class="o">=</span> <span class="nc">TVMContext</span><span class="o">.</span><span class="na">cpu</span><span class="o">();</span>
+
+    <span class="kt">long</span><span class="o">[]</span> <span class="n">shape</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">long</span><span class="o">[]{</span><span class="mi">2</span><span class="o">};</span>
+    <span class="nc">NDArray</span> <span class="n">arr</span> <span class="o">=</span> <span class="nc">NDArray</span><span class="o">.</span><span class="na">empty</span><span class="o">(</span><span class="n">shape</span><span class="o">,</span> <span class="n">ctx</span><span class="o">);</span>
+    <span class="n">arr</span><span class="o">.</span><span class="na">copyFrom</span><span class="o">(</span><span class="k">new</span> <span class="kt">float</span><span class="o">[]{</span><span class="mi">3</span><span class="n">f</span><span class="o">,</span> <span class="mi">4</span><span class="n">f</span><span class="o">});</span>
+    <span class="nc">NDArray</span> <span class="n">res</span> <span class="o">=</span> <span class="nc">NDArray</span><span class="o">.</span><span class="na">empty</span><span class="o">(</span><span class="n">shape</span><span class="o">,</span> <span class="n">ctx</span><span class="o">);</span>
+
+    <span class="n">fadd</span><span class="o">.</span><span class="na">entryFunc</span><span class="o">().</span><span class="na">pushArg</span><span class="o">(</span><span class="n">arr</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="n">arr</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="n">res</span><span class="o">).</span><span class="na">invoke</span><span class="o">();</span>
+    <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="nc">Arrays</span><span class="o">.</span><span class="na">toString</span><span class="o">(</span><span class="n">res</span><span class="o">.</span><span class="na">asFloatArray</span><span class="o">()));</span>
+
+    <span class="n">arr</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
+    <span class="n">res</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
+    <span class="n">fadd</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
+  <span class="o">}</span>
+<span class="o">}</span>
+</code></pre></div></div>
+
+<p>Once you have built TVM library following the <a href="http://docs.tvmlang.org/how_to/install.html">Installation Guide</a>, run</p>
+
+<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make jvmpkg
+make jvminstall
+</code></pre></div></div>
+
+<p>This will compile, package and install tvm4j in your local maven repository. Please refer to <a href="https://github.com/dmlc/tvm/tree/master/jvm">tvm4j</a> for more information.</p>
+
+<h2 id="remote-profile-and-test-on-iphoneipad">Remote Profile and Test on iPhone/iPad</h2>
+
+<p>Besides the Android RPC application, we also provide an <a href="https://github.com/dmlc/tvm/tree/master/apps/ios_rpc">iOS RPC app</a>, through which we can easily profile and test TVM computation workloads on iPhone or iPad. It works almost the same as that on Android, while XCode and an iOS device are required.</p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/01/16/opt-mali-gpu.html b/2018/01/16/opt-mali-gpu.html
new file mode 100644
index 0000000..06ec2a7
--- /dev/null
+++ b/2018/01/16/opt-mali-gpu.html
@@ -0,0 +1,730 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Optimizing Mobile Deep Learning on ARM GPU with TVM</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Optimizing Mobile Deep Learning on ARM GPU with TVM </h1>
+      <p class="post-meta">
+        <time datetime="2018-01-16T00:00:00-08:00" itemprop="datePublished">
+          Jan 16, 2018
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Lianmin Zheng</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>With the great success of deep learning, the demand for
+deploying deep neural networks to mobile devices is growing rapidly.
+Similar to what we do in desktop platforms, utilizing GPU in mobile devices
+can benefit both inference speed and energy efficiency. However, most
+existing deep learning frameworks do not support mobile GPU very well.
+The difficulty lies at the difference between mobile GPU architecture and
+desktop GPU architecture. It means special effort is required for optimizing on
+mobile GPU. The non-trivial extra work eventually results in the poor support
+of mobile GPU in most deep learning frameworks.</p>
+
+<p>TVM addresses the difficulty of deploying for different hardwares by
+introducing an unified IR stack, with which the optimization for different
+hardwares can be done easily.  In this post, we show how we use
+<a href="http://tvmlang.org/2017/08/17/tvm-release-announcement.html">TVM</a>/<a href="http://tvmlang.org/2017/10/06/nnvm-compiler-announcement.html">NNVM</a> to
+generate efficient kernels for ARM Mali GPU and do end-to-end compilation.
+In our test on Mali-T860 MP4, compared with
+<a href="https://developer.arm.com/technologies/compute-library">Arm Compute Library</a>,
+our method is 1.4x faster on VGG-16 and 2.2x faster on MobileNet.
+Both graph-level and operator-level optimization contribute
+to this speed up.</p>
+
+<p style="text-align: center"><img src="/images/opt-mali/end2end.png" alt="image" width="95%" /></p>
+
+<center> Figure. Inference Speed of Different Backends on ImageNet</center>
+<p></p>
+
+<h1 id="mali-midgrad-gpu">Mali Midgrad GPU</h1>
+<p>We will use Firefly-RK3399 with Mali-T860 MP4 as our test environment,
+so we mainly focus on Mali T8xx below.</p>
+
+<h2 id="architecture">Architecture</h2>
+<p>Figure 1 is an overview of the Mali Architecture on T860 and T880.
+The GPUs are scalable up to 16 coherent shader cores. Inside each
+shader core, there are 2 or 3 arithmetic pipelines, 1 load/store pipeline
+and 1 texture pipeline (so-called TriPipe). The ALU in each arithmetic
+pipeline has four 128-bit vector units and one scalar units.</p>
+
+<p>We use OpenCL for GPU computing. When mapping to OpenCL model, each
+shader core executes one or several work groups. Each shader core supports
+up to 384 concurrently executing threads. Each work item in OpenCL
+typically maps to a single thread on a Mali GPU.
+The Mali GPUs use a VLIW (Very Long Instruction Word) architecture.
+Each instruction word contains multiple operations. The Mali GPUs
+also use SIMD, so that most arithmetic instructions operate on
+multiple data elements simultaneously. <sup>[1]</sup></p>
+
+<center> <img width="50%" src="/images/opt-mali/mali-arch.png" /> </center>
+<center> Figure 1. Mali T860 and T880 (source <sup>[2]</sup>) </center>
+
+<h2 id="difference-with-nvidias-gpus">Difference with NVIDIA’s GPUs</h2>
+<p>Here are some differences that we should concern when writing OpenCL
+code for Mali GPUs, compared with writing for NVIDIA’s GPUs.</p>
+<ul>
+  <li>Mali GPUs use an unified global memory. In NVIDIA’s GPUs, we usually
+copy data to shared memory, because NVIDIA’s GPUs have physically
+separate global memory, shared memory and register. In Mali, this copy
+does not improve performance and can be removed. Besides, Mali GPUs
+usually share the global memory with CPU, so there is no need for copying
+between CPU and GPU.</li>
+  <li>Mali Midgrad GPUs are based on SIMD (Single Instruction Multiple Data)
+and need explicit vectorization. In NVIDIA CUDA, parallelism is
+achieved by SIMT (Single Instruction Multiple Thread), which does
+not require explicit vectorization. But also notice that the newer
+Mali Bitfrost GPUs are based on quad-style vectorization and does not
+require explicit vectorization.</li>
+  <li>All threads in Mali GPUs have individual program counters. It means
+the <code class="highlighter-rouge">warp size</code> is 1, so that branch divergence is not a major problem.</li>
+</ul>
+
+<h1 id="optimization--convolution-as-example">Optimization : Convolution as Example</h1>
+<p>The convolution layer is the core of most deep neural networks and
+takes most of the computation time. So we take the convolution layer
+as example to demonstrate how common optimization techniques like
+packing, tiling, unrolling and vectorization are applied in TVM.</p>
+
+<h2 id="im2col-with-gemm">Im2Col with GEMM</h2>
+<p>A well-known algorithm for convolution layer is <a href="https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/">im2col</a>,
+which converts the little 3D input cubes to columns of a matrix and
+perform a GEMM. The advantage of this method is easy utilization of
+highly optimized BLAS library.  However, the memory redundancy
+(9x memory for 3x3 kernel) is awful.</p>
+
+<h2 id="spatial-packing">Spatial Packing</h2>
+<p>Instead, we adopt a method to calculate the convolution, and apply the
+optimization techniques step by step. A convolution layer in VGG-16
+is used as tuning case, whose configuration is listed below.
+We assume the batch size is 1 for inference.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th>Input Shape</th>
+      <th>Output Shape</th>
+      <th>Kernel Size</th>
+      <th>Stride</th>
+      <th>Padding</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>56x56x256</td>
+      <td>56x56x256</td>
+      <td>3x3</td>
+      <td>(1, 1)</td>
+      <td>(1, 1)</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>As a baseline, we also list the performance of this layer in
+Arm Compute Library.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th>Kernel</th>
+      <th>Cost (second)</th>
+      <th>GFLOPS</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>GEMM method in ARMComputeLib</td>
+      <td>0.1821</td>
+      <td>20.3111</td>
+    </tr>
+  </tbody>
+</table>
+
+<h3 id="declare-the-computation-tiling-and-packing">Declare the computation: tiling and packing</h3>
+<p>Tiling and packing are two methods intended for better memory access.
+Tiling separates the whole computation into small blocks for better
+datareuse.  Packing re-layouts the input matrices according to the
+tiling so that we can access the memory sequentially, which reduces
+cache miss rate.</p>
+
+<p>We do tiling on the width dimension of the input image and CO dimension
+of the filter matrix.  This is described by <code class="highlighter-rouge">tvm.compute</code>.</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># set tiling factor
+</span><span class="n">VH</span> <span class="o">=</span> <span class="mi">1</span>
+<span class="n">VW</span> <span class="o">=</span> <span class="n">VC</span> <span class="o">=</span> <span class="mi">4</span>
+
+<span class="c1"># get input shape
+</span> <span class="n">_</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">IH</span><span class="p">,</span> <span class="n">IW</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">shape</span>
+<span class="n">CO</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">KH</span><span class="p">,</span> <span class="n">KW</span> <span class="o">=</span> <span class="n">kernel</span><span class="o">.</span><span class="n">shape</span>
+<span class="n">TH</span> <span class="o">=</span> <span class="n">IH</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">H_PAD</span>
+<span class="n">TW</span> <span class="o">=</span> <span class="n">IW</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">W_PAD</span>
+
+<span class="c1"># calc output shape
+</span><span class="n">OH</span> <span class="o">=</span> <span class="p">(</span><span class="n">IH</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="n">H_PAD</span> <span class="o">-</span> <span class="n">KH</span><span class="p">)</span> <span class="o">//</span> <span class="n">H_STR</span> <span class="o">+</span> <span class="mi">1</span>
+<span class="n">OW</span> <span class="o">=</span> <span class="p">(</span><span class="n">IW</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="n">W_PAD</span> <span class="o">-</span> <span class="n">KW</span><span class="p">)</span> <span class="o">//</span> <span class="n">W_STR</span> <span class="o">+</span> <span class="mi">1</span>
+
+<span class="c1"># data shape after packing
+</span><span class="n">dvshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">TH</span> <span class="o">//</span> <span class="p">(</span><span class="n">VH</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="p">),</span> <span class="n">TW</span> <span class="o">//</span> <span class="p">(</span><span class="n">VW</span><span class="o">*</span><span class="n">W_STRIDE</span><span class="p">), [...]
+
+<span class="c1"># kernel shape after packing
+</span><span class="n">kvshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">CO</span> <span class="o">//</span> <span class="n">VC</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">KH</span><span class="p">,</span> <span class="n">KW</span><span class="p">,</span> <span class="n">VC</span><span class="p">)</span>
+
+<span class="n">ovshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">CO</span> <span class="o">//</span> <span class="n">VC</span><span class="p">,</span> <span class="n">OH</span> <span class="o">//</span> <span class="n">VH</span><span class="p">,</span> <span class="n">OW</span> <span class="o">//</span> <span class="n">VW</span><span class="p">,</span> <span class="n">VH</span><span class="p">,</span> <span c [...]
+<span class="n">oshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">CO</span><span class="p">,</span> <span class="n">OH</span><span class="p">,</span> <span class="n">OW</span><span class="p">)</span>
+
+<span class="c1"># define packing
+</span><span class="n">data_vec</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">dvshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">, [...]
+    <span class="n">data_pad</span><span class="p">[</span><span class="n">n</span><span class="p">][</span><span class="n">ci</span><span class="p">][</span><span class="n">h</span><span class="o">*</span><span class="n">VH</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="o">+</span><span class="n">vh</span><span class="p">][</span><span class="n">w</span><span class="o">*</span><span class="n">VW</span><span class="o">*</span><span class="n">W_STRIDE</span><spa [...]
+
+<span class="n">kernel_vec</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">kvshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span><span class="p">:</span>
+    <span class="n">kernel</span><span class="p">[</span><span class="n">co</span><span class="o">*</span><span class="n">VC</span><span class="o">+</span><span class="n">vc</span><span class="p">][</span><span class="n">ci</span><span class="p">][</span><span class="n">kh</span><span class="p">][</span><span class="n">kw</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s">'kernel_vec'</span><span class="p">)</span>
+
+<span class="c1"># define convolution
+</span><span class="n">ci</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">CI</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ci'</span><span class="p">)</span>
+<span class="n">kh</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">KH</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'kh'</span><span class="p">)</span>
+<span class="n">kw</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">KW</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'kw'</span><span class="p">)</span>
+
+<span class="n">conv</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">ovshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <sp [...]
+    <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">data_vec</span><span class="p">[</span><span class="n">n</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="o">+</span><span class="n">kh</span><span  [...]
+            <span class="n">kernel_vec</span><span class="p">[</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">out_dtype</span><span class="p">),</span>
+            <span class="n">axis</span><span class="o">=</span><span class="p">[</span><span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">]),</span> <span class="n">name</span><span class="o">=</span><span class="s">'conv'</span><span class="p">)</span>
+
+<span class="c1"># unpack to correct layout
+</span><span class="n">output</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">oshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">:</span>
+                     <span class="n">conv</span><span class="p">[</span><span class="n">n</span><span class="p">][</span><span class="n">co</span><span class="o">//</span><span class="n">VC</span><span class="p">][</span><span class="n">h</span><span class="o">/</span><span class="n">VH</span><span class="p">][</span><span class="n">w</span><span class="o">//</span><span class="n">VW</span><span class="p">][</span><span class="n">h</span><span class="o">%</span><span class="n">VH</span>< [...]
+                     <span class="n">name</span><span class="o">=</span><span class="s">'output_unpack'</span><span class="p">,</span> <span class="n">tag</span><span class="o">=</span><span class="s">'direct_conv_output'</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>We can inspect the defined IR by</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">data</span><span class="p">,</span> <span class="n">kernel</span><span class="p">,</span> <span class="n">output</span><span class="p">],</span> <span [...]
+</code></pre></div></div>
+<p>I pick the convolution part here.</p>
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>produce conv {
+  for (co, 0, 64) {
+    for (h, 0, 56) {
+      for (w, 0, 14) {
+        for (vw.init, 0, 4) {
+          for (vc.init, 0, 4) {
+            conv[((((((((co*56) + h)*14) + w)*4) + vw.init)*4) + vc.init)] = 0.000000f
+          }
+        }
+        for (ci, 0, 256) {
+          for (kh, 0, 3) {
+            for (kw, 0, 3) {
+              for (vw, 0, 4) {
+                for (vc, 0, 4) {
+                  conv[((((((((co*56) + h)*14) + w)*4) + vw)*4) + vc)] = (conv[((((((((co*56) + h)*14) + w)*4) + vw)*4) + vc)] + (data_vec[(((((((((h*14) + w)*256) + ci)*3) + kh)*6) + kw) + vw)]*kernel_vec[((((((((co*256) + ci)*3) + kh)*3) + kw)*4) + vc)]))
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+}
+</code></pre></div></div>
+
+<h3 id="kernel-1-bind-thread">Kernel 1: bind thread</h3>
+<p>In TVM, we declare the computation at first and then <em>schedule</em> it.
+This mechanism decouples the algorithm and implementation detail. (This idea
+is from <a href="http://halide-lang.org/">Halide</a>).</p>
+
+<p>The following schedule simply binds axes to GPU threads, so that
+our code can run on Mali GPU.</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># helper function for binding thread
+</span><span class="k">def</span> <span class="nf">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">tensor</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">z_factor</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">y_factor</span><span cla [...]
+    <span class="s">""" tile and bind 3d """</span>
+    <span class="n">y_factor</span> <span class="o">=</span> <span class="n">y_factor</span> <span class="ow">or</span> <span class="n">z_factor</span>
+    <span class="n">x_factor</span> <span class="o">=</span> <span class="n">x_factor</span> <span class="ow">or</span> <span class="n">y_factor</span>
+    <span class="n">zo</span><span class="p">,</span> <span class="n">zi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">z_factor</span><span class="p">)</span>
+    <span class="n">yo</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y_factor</span><span class="p">)</span>
+    <span class="n">xo</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x_factor</span><span class="p">)</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">zo</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.z"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">zi</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.z"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">yo</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.y"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">yi</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.y"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">xo</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">))</span>
+
+<span class="c1"># set tunable parameter
+</span><span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
+
+<span class="c1"># schedule data packing
+</span><span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class [...]
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">data_vec</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="c1"># schedule kernel packing
+</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
+<span class="n">tile_and_bind</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">kernel_vec</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="c1"># schedule conv
+</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n" [...]
+<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">reduce_axis</span>
+
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh< [...]
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">conv</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>With this schedule, our code can run now, but the performance is terrible.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th>Kernel</th>
+      <th>Cost (second)</th>
+      <th>GFLOPS</th>
+      <th>speedup</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>GEMM method in ARMComputeLib</td>
+      <td>0.1821</td>
+      <td>20.3111</td>
+      <td>1x</td>
+    </tr>
+    <tr>
+      <td>Kernel 1: simple bind</td>
+      <td>5.6154</td>
+      <td>0.6588</td>
+      <td>0.03x</td>
+    </tr>
+  </tbody>
+</table>
+
+<h3 id="kernel-2-unrolling">Kernel 2: unrolling</h3>
+<p>Loop unrolling can reduce the instructions for loop control, reduce
+branch penalties and hide latency in reading memory.
+In TVM, this can be done easily by calling <code class="highlighter-rouge">s.unroll(axis)</code></p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># set tunable parameter
+</span><span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
+
+<span class="c1"># schedule data packing
+</span><span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class [...]
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">data_vec</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="s">"""!! ADD UNROLL HERE !!"""</span>
+<span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+
+<span class="c1"># schedule kernel packing
+</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
+<span class="n">tile_and_bind</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">kernel_vec</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="s">"""!! ADD UNROLL HERE !!"""</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+
+<span class="c1"># schedule conv
+</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n" [...]
+<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">reduce_axis</span>
+
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh< [...]
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">conv</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="s">"""!! ADD UNROLL HERE !!"""</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+
+<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+</code></pre></div></div>
+
+<table>
+  <thead>
+    <tr>
+      <th>Kernel</th>
+      <th>Cost (second)</th>
+      <th>GFLOPS</th>
+      <th>speedup</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>GEMM method in ARMComputeLib</td>
+      <td>0.1821</td>
+      <td>20.3111</td>
+      <td>1x</td>
+    </tr>
+    <tr>
+      <td>Kernel 1: simple bind</td>
+      <td>5.6154</td>
+      <td>0.6588</td>
+      <td>0.03x</td>
+    </tr>
+    <tr>
+      <td>Kernel 2: + unrolling</td>
+      <td>0.3707</td>
+      <td>9.9796</td>
+      <td>0.49x</td>
+    </tr>
+  </tbody>
+</table>
+
+<h3 id="kernel3-vectorization">Kernel3: vectorization</h3>
+<p>As mentioned before, we need to do vectorization explictly
+ in order to achieve the best performance on Mali GPU.</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># set tunable parameter
+</span><span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
+
+<span class="c1"># schedule data packing
+</span><span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class [...]
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">data_vec</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="c1"># unroll
+</span><span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+
+<span class="c1"># schedule kernel packing
+</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
+<span class="n">tile_and_bind</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">kernel_vec</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="c1"># unroll
+</span><span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="s">"""!! VECTORIZE HERE !!"""</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+
+<span class="c1"># schedule conv
+</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n" [...]
+<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">reduce_axis</span>
+
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh< [...]
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">conv</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+
+<span class="c1"># unroll
+</span><span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+<span class="s">"""!! VECTORIZE HERE !!"""</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+
+<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
+<span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+</code></pre></div></div>
+
+<table>
+  <thead>
+    <tr>
+      <th>Kernel</th>
+      <th>Cost (second)</th>
+      <th>GFLOPS</th>
+      <th>speedup</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>GEMM method in ARMComputeLib</td>
+      <td>0.1821</td>
+      <td>20.3111</td>
+      <td>1x</td>
+    </tr>
+    <tr>
+      <td>Kernel 1: simple bind</td>
+      <td>5.6154</td>
+      <td>0.6588</td>
+      <td>0.03x</td>
+    </tr>
+    <tr>
+      <td>Kernel 2: + unrolling</td>
+      <td>0.3707</td>
+      <td>9.9796</td>
+      <td>0.49x</td>
+    </tr>
+    <tr>
+      <td>Kernel 3: + vectorization</td>
+      <td>0.1304</td>
+      <td>28.3679</td>
+      <td>1.40x</td>
+    </tr>
+  </tbody>
+</table>
+
+<h3 id="how-to-set-the-tunable-parameter">How to set the tunable parameter</h3>
+<p>As for the tunable parameters above, some can be calculated.
+For the vectorized dimension <code class="highlighter-rouge">VC</code>, we should fill the 128-bit register,
+so it can be set as 128/32=4 for float32 and 128/16=8 for float16.</p>
+
+<p>But more often we cannot determine the optimal value, due to the
+complicated runtime. We use grid search in TVM. It can be
+done extremely effective since we write python code in TVM’s high-level
+IR rather than direct OpenCL code.</p>
+
+<h3 id="the-generated-opencl-code">The generated OpenCL code</h3>
+<p>We can view the generated OpenCL code by</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">func</span><span class="o">.</span><span class="n">imported_modules</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">get_source</span><span class="p">())</span>
+</code></pre></div></div>
+<p>The OpenCL code is too long to be pasted here, and it is hard to read due
+to heavy unrolling. If interested, you can view it
+<a href="https://github.com/merrymercy/tvm-mali/blob/master/data/kernels.cl">here</a>.</p>
+
+<h1 id="end-to-end-benchmarking">End-to-End Benchmarking</h1>
+<p>In this section, we compare the comprehensive performance between
+different backends on some popular deep neural networks.
+Our test environment is</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Firefly-RK3399 4G
+CPU: dual-core Cortex-A72 + quad-core Cortex-A53
+GPU: Mali-T860MP4
+
+Arm Compute Library : v17.12
+MXNet: v1.0.1
+Openblas: v0.2.18
+</code></pre></div></div>
+
+<p>We use NNVM and TVM to do end-to-end compilation.</p>
+
+<h2 id="performance">Performance</h2>
+
+<p style="text-align: center"><img src="/images/opt-mali/end2end.png" alt="image" width="95%" /></p>
+
+<center> Figure 2. Inference Speed of Different Backends on ImageNet</center>
+<p></p>
+
+<p>As shown in Figure 2, we test the inference speed on ImageNet.
+On Firefly-RK3399, Mali GPU can be 2x ~ 4x faster than 6-core big.LITTLE CPU.
+Our end-to-end pipeline is 1.4x ~ 2.2x faster than Arm Compute Library.
+We try both GEMM and direct method of convolution layer in
+Arm Compute Library, GEMM method is always faster than direct method
+in these test cases, so we only plot the result of GEMM method.</p>
+
+<p>Some results, like resnet18 on Arm Compute Library, are missing in the Figure 2.
+It is because the graph runtime of Arm Compute Library does not support
+skip connection currently and has a poor neon implementation of
+depthwise convolution.  This also reflects the advantage of NNVM
+software stack.</p>
+
+<h2 id="half-precision-performance">Half-Precision Performance</h2>
+<p>Precision in deep neural networks is not very important, especially
+for the inference on mobile devices. Using low-precision arithmetic
+can make the inference much faster. We also test the half-precision
+floating number on Mali GPU.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th>model</th>
+      <th>backend</th>
+      <th>Time Cost per Image (second)</th>
+      <th>speed up to FP32</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>vgg16</td>
+      <td>ACM-mali</td>
+      <td>0.9694</td>
+      <td>1.69</td>
+    </tr>
+    <tr>
+      <td>vgg16</td>
+      <td>TVM-mali</td>
+      <td>0.6896</td>
+      <td><strong>1.87x</strong></td>
+    </tr>
+    <tr>
+      <td>MobileNet 1.0</td>
+      <td>TVM-mali</td>
+      <td>0.0479</td>
+      <td>1.60x</td>
+    </tr>
+    <tr>
+      <td>ResNet18</td>
+      <td>TVM-mali</td>
+      <td>0.1183</td>
+      <td>1.73x</td>
+    </tr>
+  </tbody>
+</table>
+
+<center> Table 1. Inference Speed of FP16 on ImageNet</center>
+<p></p>
+
+<p>In theory, FP16 can both double peak compute and halve memory consumption,
+so that doubling the speed. But it needs good input shape for
+longer vectorization and fine-tuning some parameters.</p>
+
+<h2 id="further-work-on-mobile-devices">Further Work on Mobile Devices</h2>
+<p>We should admit that there is still some room for improvement,
+mainly at the graph level, such as model compression and weight prelayout.
+Further improvement in NNVM will try to solve these problems.</p>
+
+<h1 id="show-me-the-code">Show me the code</h1>
+
+<ul>
+  <li><a href="https://github.com/merrymercy/tvm-mali">End-to-End benchmark</a></li>
+  <li><a href="https://github.com/dmlc/tvm/tree/master/topi/python/topi/mali">Convolution and Depthwise Convolution Schedule</a></li>
+</ul>
+
+<h1 id="bio--acknowledgement">Bio &amp; Acknowledgement</h1>
+<p><a href="https://lmzheng.net">Lianmin Zheng</a> is an undergraduate
+student at SJTU Apex lab.  He is interested in machine learning
+and building computer system.</p>
+
+<p>The author has many thanks to
+<a href="https://homes.cs.washington.edu/~tqchen/">Tianqi Chen</a> for his helpful
+advice and <a href="https://github.com/yzhliu">Yizhi Liu</a> for his earlier work.</p>
+
+<h1 id="reference">Reference</h1>
+<p>[1] <a href="https://developer.arm.com/docs/100614/0302">ARM Mali GPU OpenCL Developer Guide</a>
+[2] <a href="https://developer.arm.com/">ARM Developer</a></p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/03/12/webgl.html b/2018/03/12/webgl.html
new file mode 100644
index 0000000..a295bc0
--- /dev/null
+++ b/2018/03/12/webgl.html
@@ -0,0 +1,272 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Compiling Deep Learning Models to WebGL with TVM</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Compiling Deep Learning Models to WebGL with TVM </h1>
+      <p class="post-meta">
+        <time datetime="2018-03-12T00:00:00-07:00" itemprop="datePublished">
+          Mar 12, 2018
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Zhixun Tan</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>Now TVM comes with a brand-new OpenGL/WebGL backend!
+This blog post explains what it is, and what you can achieve with it.</p>
+
+<h1 id="the-openglwebgl-backend">The OpenGL/WebGL Backend</h1>
+
+<p>TVM already targets a lot of backends covering a variety of platforms: CPU, GPU,
+mobile devices, etc… This time we are adding another backend: OpenGL/WebGL.</p>
+
+<p>OpenGL/WebGL enables us to leverage the GPU on an environment which does not
+have CUDA installed. It is, for the time being, the only way of using the GPU
+inside a browser.</p>
+
+<p>This new backend allows us to use OpenGL/WebGL in 3 different ways:</p>
+<ul>
+  <li><strong>Local OpenGL</strong>:
+We can compile a deep learning model into OpenGL and directly
+run it on the local machine, entirely using Python.</li>
+  <li><strong>WebGL with RPC</strong>:
+We can compile a deep learning model into WebGL and export
+it as a shared library via Emscripten, with JavaScript host code and WebGL device code. Then
+we can deploy that library through RPC onto a TVM JavaScript runtime system
+running inside a browser.</li>
+  <li><strong>WebGL with static library</strong>:
+We can compile a deep learning model into WebGL,
+link it with the TVM JavaScript runtime system and export the entire package.
+Then we can run the model in a web page on a browser, with no dependency.
+The detailed flow is described in figure 1.</li>
+</ul>
+
+<p>We rely on Emscripten and its fastcomp LLVM backend to generate the javascript backend.</p>
+
+<p style="text-align: center"><img src="/images/opengl/webgl-flow.png" alt="image" width="65%" /><br />
+Figure 1</p>
+
+<p>See <a href="https://github.com/dmlc/nnvm/blob/master/tutorials/from_mxnet_to_webgl.py">here</a>
+for examples of all three of them.</p>
+
+<h1 id="how-is-this-different-from-x">How is this Different from X?</h1>
+
+<p>Running a neural network on a browser isn’t an entirely new thing.
+Andrej Karpathy’s <a href="https://cs.stanford.edu/people/karpathy/convnetjs/">ConvNetJS</a>
+and Google’s <a href="https://deeplearnjs.org/">DeepLearning.JS</a> are examples of that.</p>
+
+<p>So what’s unique about TVM with WebGL? The big difference is that the op kernels
+in TVM are automatically compiled, not handwritten. As shown in Figure 2, TVM
+utilizes a unified AST to define kernels, and compiles it to code on different
+platforms.</p>
+
+<p style="text-align: center"><img src="/images/opengl/comparison.png" alt="" width="50%" /><br />
+Figure 2</p>
+
+<p>This means that:</p>
+<ul>
+  <li>To deploy your existing model to WebGL, you don’t need to write a lot of
+additional code. The NNVM/TVM model definition is the same for all targets, so
+you just need to compile it to a new target.</li>
+  <li>To add a new op kernel, you only need to define it in TVM once, instead of
+implementing it once for every target. You don’t need to know how to write
+GLSL code to add a new op kernel to WebGL!</li>
+</ul>
+
+<h1 id="benchmark">Benchmark</h1>
+
+<p>Here we perform a benchmark for a typical workload: image classification using
+resnet18.</p>
+
+<p>I’m using my <a href="https://www.asus.com/us/Laptops/N76VZ/">5-year-old laptop</a> which
+has an 8-core Intel® Core™ i7-3610QM, and a GTX650M.</p>
+
+<p>In this benchmark, we download a resnet18 model from the Gluon model zoo, and
+perform end-to-end classification on a cat image. We only measure the model
+execution time (without model/input/parameter loading), and each model is run
+100 times to get an average. The results are shown in figure 3.</p>
+
+<p style="text-align: center"><img src="/images/opengl/opengl-benchmark.png" alt="image" /><br />
+Figure 3</p>
+
+<p>The benchmark is run in 4 different settings:</p>
+<ul>
+  <li><strong>CPU (LLVM)</strong>: The model is compiled into LLVM IR and JIT’ed. Therefore, it is
+run entirely on the CPU.</li>
+  <li><strong>OpenCL</strong>: The model is compiled into OpenCL. There is still some glue code
+compiled to LLVM, which is responsible for setting up and launching OpenCL
+kernels. Then we run it on the local machine.</li>
+  <li><strong>OpenGL</strong>: Same as OpenCL, but compiled to OpenGL.</li>
+  <li><strong>WebGL</strong>: The glue code is compiled to LLVM, and transformed to JavaScript using
+Emscripten’s Fastcomp LLVM backend.
+The device code is compiled to WebGL. We execute the model in Firefox.</li>
+</ul>
+
+<p>From the result above we can observe that, the TVM OpenGL backend has a similar
+performance as OpenCL. More interestingly, the WebGL version inside the browser
+isn’t significantly slower than desktop OpenGL. Considering that the host code
+is JavaScript, this is quite surprising. This might be due to the fact that
+Emscripten generates <a href="http://asmjs.org/">asm.js</a> which enables dramatic
+optimizations in Firefox.</p>
+
+<p>This is a first step toward automatic compilation of deep learning models
+into web browser. We expect more performance improvements as we bring
+optimizations into the TVM stack.</p>
+
+<h2 id="show-me-the-code">Show me the Code</h2>
+<ul>
+  <li>Checkout <a href="https://github.com/dmlc/nnvm/blob/master/tutorials/from_mxnet_to_webgl.py">this complete code example</a>.</li>
+</ul>
+
+<h2 id="acknowledgement">Acknowledgement</h2>
+<p>We thank the developers of Emscripten for providing the fastcomp toolchain and the helps during the development.</p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/03/23/nmt-transformer-optimize.html b/2018/03/23/nmt-transformer-optimize.html
new file mode 100644
index 0000000..e2a99c6
--- /dev/null
+++ b/2018/03/23/nmt-transformer-optimize.html
@@ -0,0 +1,418 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Bringing TVM into TensorFlow for Optimizing Neural Machine Translation on GPU</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Bringing TVM into TensorFlow for Optimizing Neural Machine Translation on GPU </h1>
+      <p class="post-meta">
+        <time datetime="2018-03-23T00:00:00-07:00" itemprop="datePublished">
+          Mar 23, 2018
+        </time>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <h2 id="author">Author</h2>
+
+<p>This is a guest blogpost contributed by Alibaba Group’s Machine Translation Platform team and PAI-Blade team</p>
+
+<h2 id="background">Background</h2>
+
+<p>Neural Machine Translation (NMT) is an end-to-end approach for automating translation, with the potential to overcome the weaknesses in conventional phrase-based translation systems. Recently, Alibaba Group is working on deploying NMT service for global e-commerce.</p>
+
+<p>Currently we are exploiting Transformer [1] as the major backbone in our NMT system since it is more friendly for efficient offline training with on-par (even higher) precison against classical RNN/LSTM-based models. Although Transformer is friendly for the offline training phase as it breaks the dependencies across time steps, it is not quite efficiency for online inference. In our production environment, it has been found that the inference speed of the intial version of Transformer [...]
+One paricular challenge we observed, is that batch matmul is a major performance hot-spot in Transformer and the current implementation in cuBLAS is not well optimized.</p>
+
+<p style="text-align: center"><img src="/images/nmt-transformer/model_arch.png" alt="image" width="40%" /></p>
+
+<p>The results below show that TVM generated kernel (with schdule optimization) brings at least <b><em>13X</em></b> speed-up for batch matmul computation, and a futher speed up with operator fusion enabled.</p>
+
+<p style="text-align: center"><img src="/images/nmt-transformer/batch-matmul-bar-charts.png" alt="image" width="45%" /></p>
+
+<h2 id="batch-matmul">Batch Matmul</h2>
+
+<h3 id="why-batch-matmul">Why batch matmul</h3>
+<p>In Transformer, batch matmul is widely used in the computation of multi-head attention. Using batch matmul, multiple heads in the attention layer can run in parallel, which can help improve the computation efficiency of the hardware.</p>
+
+<p style="text-align: center"><img src="/images/nmt-transformer/batchmatmul.png" alt="image" width="90%" /></p>
+
+<p>We conducted a thorough profiling of the Transformer model in the inference phase, and it is shown that batch matmul computation contribute up to ~ 30% of GPU kernel execution time. Using nvprof[2] to do some first-principle analysis of cuBLAS’s batch matmul kernel,it is clearly indicated that current implementation is quite under-performing and several interesting phenomena are observed.</p>
+
+<h3 id="what-is-batch-matmul">What is batch matmul</h3>
+<p>Typically, a batch matmul computation performs the matrix-matrix multiplication over a batch of matrices. The batch is considered to be “uniform”, i.e. all instances have the same dimensions (M, N, K), leading dimensions (lda, ldb, ldc) and transpositions for their respective A, B and C matrices.</p>
+
+<p>Batch matmul computation can be described more concretely as follows:</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void BatchedGemm(input A, input B, output C, M, N, K, batch_dimension) {
+  for (int i = 0; i &lt; batch_dimension; ++i)  {
+    DoGemm(A[i],B[i],C[i],M,K,N)
+  }
+}
+</code></pre></div></div>
+
+<h4 id="batch-matmul-shapes">Batch matmul shapes</h4>
+
+<p>In the language translation tasks, shape of the batch matmul is significantly smaller than normal matmul computation in other workloads. The shape in Transformer is relevant to the length of input sentences and that of decoder steps. Normally, it is smaller than 30.</p>
+
+<p>As to the batch dimension, it is a fixed number given a certain inference batch size. For instance, if 16 is used as batch size with beam size being 4, the batch dimension is 16 * 4 * #head (number of heads in multi-head attention, which is usually 8). The shape of the matrix M, K, N are within the range of  [1, max decode length] or [1, max encode length].</p>
+
+<h3 id="performance-issue-of-cublas-batch-matmul">Performance issue of cuBLAS’ batch matmul</h3>
+
+<p>Firstly, we make a theoretical FLOPs analysis over the batch matmul kernels. The results are quite interesting: all the batch matmul have limited computation intensity (less than 1 TFLOPs).</p>
+
+<p>Then we profile the cuBLAS performance of batch matmul with multiple shapes through nvprof. The table below shows some of the metrics obtained on a NVIDIA M40 GPU with CUDA8.0.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th>input shape <br /> [batch, M, N, K]</th>
+      <th>kernel</th>
+      <th>theoretical FLOPs</th>
+      <th>nvprof observed FLOPs</th>
+      <th>theoretical FLOPs / <br /> observed FLOPs</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>[512, 17, 17, 128]</td>
+      <td><strong>maxwell_sgemmBatched_128x128_raggedMn_tn</strong></td>
+      <td>18939904</td>
+      <td>2155872256</td>
+      <td>0.87%</td>
+    </tr>
+    <tr>
+      <td>[512, 1, 17, 128]</td>
+      <td><strong>maxwell_sgemmBatched_128x128_raggedMn_tn</strong></td>
+      <td>1114112</td>
+      <td>2155872256</td>
+      <td>0.052%</td>
+    </tr>
+    <tr>
+      <td>[512, 17, 1, 128]</td>
+      <td><strong>maxwell_sgemmBatched_128x128_raggedMn_tn</strong></td>
+      <td>1114112</td>
+      <td>2155872256</td>
+      <td>0.052%</td>
+    </tr>
+    <tr>
+      <td>[512, 30, 30, 128]</td>
+      <td><strong>maxwell_sgemmBatched_128x128_raggedMn_tn</strong></td>
+      <td>58982400</td>
+      <td>2155872256</td>
+      <td>2.74%</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>Even with different shapes (varing in M, N, K), all the <strong>maxwell_sgemmBatched_128x128_raggedMn_tn</strong> calls execute the same amount of FLOPs, which is much bigger than the theoretical value. It can be inferred that all these different shapes may be padded to a certain shape. Among all these shapes, even in the best case, the theoretical FLOPs is still only 2.74% of the actually executed FLOPs, <em>therefore most of the computation is quite redundant</em>. Similarly, the ca [...]
+
+<p><b>It is obvious that cuBLAS’ batch matmul implementation is far from efficiency. Thus we use TVM to generate efficient batch matmul kernels for our NMT workloads.</b></p>
+
+<h2 id="batch-matmul-computation">Batch matmul computation</h2>
+
+<p>In TVM, a general batch matmul computation can be declared as:</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># computation representation
+A = tvm.placeholder((batch, M, K), name='A')
+B = tvm.placeholder((batch, K, N), name='B')
+k = tvm.reduce_axis((0, K), 'k')
+C = tvm.compute((batch, M, N),
+         lambda b, y, x: tvm.sum(A[b, y, k] * B[b, k, x], axis = k),
+         name = 'C')
+</code></pre></div></div>
+
+<h2 id="schedule-optimization">Schedule optimization</h2>
+
+<p>After declaring the computation, we need to devise our own schedule carefully to squeeze performance potential.</p>
+
+<h3 id="tuning-parameters-of-blockthread-numbers">Tuning parameters of block/thread numbers</h3>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  # thread indices
+  block_y = tvm.thread_axis("blockIdx.y")
+  block_x = tvm.thread_axis("blockIdx.x")
+  thread_y = tvm.thread_axis((0, num_thread_y), "threadIdx.y")
+  thread_x = tvm.thread_axis((0, num_thread_x), "threadIdx.x")
+  thread_yz = tvm.thread_axis((0, vthread_y), "vthread", name="vy")
+  thread_xz = tvm.thread_axis((0, vthread_x), "vthread", name="vx")
+
+  # block partitioning
+  BB, FF, MM, PP = s[C].op.axis
+  BBFF = s[C].fuse(BB, FF)
+  MMPP = s[C].fuse(MM, PP)
+  by, ty_block = s[C].split(BBFF, factor = num_thread_y * vthread_y)
+  bx, tx_block = s[C].split(MMPP, factor = num_thread_x * vthread_x)
+  s[C].bind(by, block_y)
+  s[C].bind(bx, block_x)
+  vty, ty = s[C].split(ty_block, nparts = vthread_y)
+  vtx, tx = s[C].split(tx_block, nparts = vthread_x)
+  s[C].reorder(by, bx, vty, vtx, ty, tx)
+  s[C].reorder(by, bx, ty, tx)
+  s[C].bind(ty, thread_y)
+  s[C].bind(tx, thread_x)
+  s[C].bind(vty, thread_yz)
+  s[C].bind(vtx, thread_xz)
+</code></pre></div></div>
+<p>We fuse the outer dimensions of the batch matmul, i.e. the BB and FF of the op’s dimension, normally known as “batch” dimension in batch matmul computation. Then we split the outer and the inner dimensions by a factor of (<code class="highlighter-rouge">number_thread * vthread</code>).</p>
+
+<p>Strided pattern is not needed in batch matmul, thus the virtual thread number (<code class="highlighter-rouge">vthread_y</code> and <code class="highlighter-rouge">vthread_x</code>) are both set to 1.</p>
+
+<h4 id="finding-the-best-combination-of-number_thread">Finding the best combination of number_thread</h4>
+
+<p>The results below are obtained on a NVIDIA M40 GPU device with CUDA8.0.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th>Input Shape [batch,features,M,N,K]</th>
+      <th>num_thread_y, num_thread_x</th>
+      <th>num_vthread_y, num_vthread_x</th>
+      <th>Time(us)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>[64,8,1,17,128]</td>
+      <td>8,1</td>
+      <td>32,1</td>
+      <td>37.62</td>
+    </tr>
+    <tr>
+      <td>[64,8,1,17,128]</td>
+      <td>4,1</td>
+      <td>32,1</td>
+      <td>39.30</td>
+    </tr>
+    <tr>
+      <td>[64,8,1,17,128]</td>
+      <td>1,1</td>
+      <td>32,1</td>
+      <td>38.82</td>
+    </tr>
+    <tr>
+      <td>[64,8,1,17,128]</td>
+      <td>1,1</td>
+      <td>256,1</td>
+      <td>41.95</td>
+    </tr>
+    <tr>
+      <td>[64,8,1,17,128]</td>
+      <td>32,1</td>
+      <td>1,1</td>
+      <td>94.61</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>As learned from <a href="http://tvmlang.org/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html">past experience</a>, the method to find the best combination of <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> is through brute-force search. After a brute-force search, the best combination for current shape can be found, which in current computation is <code class="highlighter-rouge">nu [...]
+
+<h2 id="fuse-batch-matmul-with-other-operations">Fuse batch matmul with other operations</h2>
+
+<p>Normally, the existing “black-box” cuBLAS library calls play the role as the boundary of the normally used “op fusion” optimization tactics. However, with the generated efficient batch matmul kernel, the fusion boundary can be easily broken, more than just element-wise operations can be fused, thus futher performance improvement can be obtained.</p>
+
+<p>It is observed from the the computation graph that a batch matmul is always followed by a <em>broadcast add</em> operation or a <em>transpose</em> operation. By fusing “add” or “transpose” operation with batch matmul, kernel launch overhead and redundant memory access time can be reduced.</p>
+
+<p>Batch matmul and broadcast add fusion computation can be declared as follows:</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># computation representation
+A = tvm.placeholder((batch_size, features, M, K), name='A')
+# the shape of B is (N, K) other than (K, N) is because B is transposed is this fusion pattern
+B = tvm.placeholder((batch_size, features, N, K), name='B')
+ENTER = tvm.placeholder((batch_size, 1, M, N), name = 'ENTER')
+k = tvm.reduce_axis((0, K), 'k')
+C = tvm.compute(
+           (batch_size, features, M, N),
+           lambda yb, yf, m, x: tvm.sum(A[yb, yf, m, k] * B[yb, yf, x, k], axis = k),
+           name = 'C')
+D = topi.broadcast_add(C, ENTER)
+</code></pre></div></div>
+
+<p>Batch matmul and transpose fusion computation can be declared as:</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># computation representation
+A = tvm.placeholder((batch_size, features, M, K), name='A')
+B = tvm.placeholder((batch_size, features, K, N), name='B')
+k = tvm.reduce_axis((0, K), 'k')
+C = tvm.compute(
+           (batch_size, M, features, N),
+           lambda yb, m, yf, x: tvm.sum(A[yb, yf, m, k] * B[yb, yf, k, x], axis = k),
+           name = 'C')
+</code></pre></div></div>
+<h3 id="fusion-kernel-performance">Fusion Kernel Performance</h3>
+
+<p>The shape of [batch=64, heads=8, M=1, N=17, K=128] is chosen to elaborate the performance of the generated code. 17 is chosen as the sequence length since it is the average input length in our production scenarios.</p>
+
+<ul>
+  <li>tf-r1.4 <code class="highlighter-rouge">BatchMatmul</code>: 513.9 us</li>
+  <li>tf-r1.4 <code class="highlighter-rouge">BatchMatmul</code> + <code class="highlighter-rouge">Transpose</code> (separate): 541.9 us</li>
+  <li>TVM <code class="highlighter-rouge">BatchMatmul</code>: 37.62 us</li>
+  <li>TVM <code class="highlighter-rouge">BatchMatmul</code> + <code class="highlighter-rouge">Transpose</code> (fused): 38.39 us</li>
+</ul>
+
+<p>The kernel fusion optimization brings a further <b><em>1.7X</em></b> speed-up.</p>
+
+<h2 id="integration-with-tensorflow">Integration with Tensorflow</h2>
+
+<p>The input shape of batch matmul in our workload is finite and can be enumerated easily in advance. With those pre-defined shapes, we can generate highly optimized CUDA kernel ahead of time (fixed shape computation could bring the best optimization potential). Meanwhile, a general batch matmul kernel suitable for most of the shapes will also be generated to provide a fall-back machanism for the shapes which does not have a corresponding ahead-of-time generated kernel.</p>
+
+<p>The generated efficient kernels for specific shapes and the fall-back one are integrated into the Tensorflow framework. We develop fused ops, such as BatchMatMulTranspose or BatchMatMulAdd, to launch the specific generated kernel using TVM’s runtime API for certain input shape or invoke the fall-back kernel. A graph optimization pass is conducted to automatically replace the origin batch <em>matmul + add/transpose</em> pattern with the fused ops. Meanwhile, by combining a more aggress [...]
+
+<h2 id="summary">Summary</h2>
+<p>Inside Alibaba, we found that TVM is a very productive tool to develop high performance GPU kernels to meet our in-house requirements. In this blog, NMT Transformer model is taken as an example to illustrate our optimization strategy with TVM. Firstly, we locate the hot-spot of Transformer model through first-principle analysis. Then we use TVM to generate highly optimized CUDA kernel to replace cuBLAS version (<b><em>13X</em></b> speed-up is observed). Next, we leverage TVM’s kernel  [...]
+
+<h2 id="resources">Resources</h2>
+<ul>
+  <li><a href="https://github.com/Orion34C/tvm-batch-matmul-example/blob/master/tvm_batch_matmul_transpose_m1_kX.py">TVM implementation of fused batch matmul + transpose computation</a></li>
+</ul>
+
+<h2 id="references">References</h2>
+<p>[1] <a href="https://arxiv.org/pdf/1706.03762.pdf">Attention is All You Need</a></p>
+
+<p>[2] <a href="https://devblogs.nvidia.com/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/">nvprof is Your Handy Universal GPU Profiler</a></p>
+
+<p>[3] <a href="https://github.com/tensorflow/tensorflow/pull/16306">Add Loop Invariant Node Motion Optimization in GraphOptimizer</a></p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/07/12/vta-release-announcement.html b/2018/07/12/vta-release-announcement.html
new file mode 100644
index 0000000..02974c5
--- /dev/null
+++ b/2018/07/12/vta-release-announcement.html
@@ -0,0 +1,294 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>VTA: An Open, Customizable Deep Learning Acceleration Stack </title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>VTA: An Open, Customizable Deep Learning Acceleration Stack  </h1>
+      <p class="post-meta">
+        <time datetime="2018-07-12T00:00:00-07:00" itemprop="datePublished">
+          Jul 12, 2018
+        </time>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p style="text-align: center">Thierry Moreau(VTA architect), Tianqi Chen(TVM stack), Ziheng Jiang†(graph compilation), Luis Vega(cloud deployment)</p>
+<p style="text-align: center">Advisors: Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy</p>
+<p style="text-align: center">Paul G. Allen School of Computer Science &amp; Engineering, University of Washington</p>
+
+<p>Hardware acceleration is an enabler for ubiquitous and efficient deep learning. With hardware accelerators appearing in the datacenter and edge devices, hardware specialization has taken on a prominent role in the deep learning system stack.</p>
+
+<p>We are excited to announce the launch of the Versatile Tensor Accelerator (VTA, pronounced <em>vita</em>), an open, generic, and customizable deep learning accelerator design. VTA is a programmable accelerator that exposes a RISC-like programming abstraction to describe tensor-level operations. We designed VTA to expose the most salient and common characteristics of mainstream deep learning accelerators, such as tensor operations, DMA load/stores, and explicit compute/memory arbitration.</p>
+
+<p>VTA is more than a standalone accelerator design: it’s an end-to-end solution that includes drivers, a JIT runtime, and an optimizing compiler stack based on TVM. The current release includes a behavioral hardware simulator, as well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast prototyping. By extending the TVM stack with a customizable, and open source deep learning hardware accelerator design, we are exposing a transparent end-to-end deep learning stack from [...]
+
+<p style="text-align: center"><img src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png" alt="image" width="50%" /></p>
+
+<p>The VTA and TVM stack together constitute a blueprint for end-to-end, accelerator-centric deep learning system that can:</p>
+
+<ul>
+  <li>Provide an open deep learning system stack for hardware, compilers, and systems researchers alike to incorporate optimizations and co-design techniques.</li>
+  <li>Lower the barrier of entry for machine learning practitioners to experiment with novel network architectures, operators and data representations that require specialized hardware support.</li>
+</ul>
+
+<h2 id="use-case-scenarios-for-researchers">Use-Case Scenarios for Researchers</h2>
+
+<p>We highlight ways in which the VTA design together with a complete TVM software stack can enable novel opportunities across hardware, compilers, and deep learning research.</p>
+
+<h3 id="hardware-designers-and-computer-architects">Hardware Designers and Computer Architects</h3>
+
+<p>With new ASIC designs being regularly announced, providing a complete and usable software stack on top of novel hardware is essential to gain a competitive edge both in research circles, and commercially.
+Our VTA release provides a reference TVM software stack built for hardware accelerators.
+We hope to empower hardware designers to quickly build and deploy optimized deep learning libraries ready to be utilized by high-level frameworks of the likes of TensorFlow or PyTorch.
+Software support is essential for performing full-system evaluation to understand the limits and performance bottlenecks in hardware-accelerated systems.
+With the use of FPGAs as hardware deployment backends, we provide a complete solution for rapid and iterative hardware design prototyping.
+Finally, our vision is to see VTA grow into an collection of hardware designs, eventually leading to an open ecosystem of custom hardware accelerators.</p>
+
+<p style="text-align: center"><img src="https://www.acm.org/binaries/content/gallery/acm/ctas/publications/artifact-badges.jpg/artifact-badges.jpg/acm%3Adesktopcta" alt="image" width="20%" /></p>
+
+<p>In addition, VTA is one of the first hardware-software reproducible <a href="http://ctuning.org/ae/">ACM artifacts</a>, which can serve as a template for reproducible deep learning architecture research.
+The VTA artifact deployable using <a href="http://cknowledge.org/">CK</a>, was presented at ReQuEST 2018, co-located with <a href="http://cknowledge.org/request-cfp-asplos2018.html">ASPLOS</a>.</p>
+
+<h3 id="optimizing-compilers-researchers">Optimizing Compilers Researchers</h3>
+
+<p>Novel intermediate representations and optimizing compilers of the likes of TVM have been proposed to better take advantage of deep learning workloads characteristics.
+VTA complements TVM to provide accelerator-centric optimization passes, and low-level code generation. Our open-source deep learning compiler stack also aims to emulate the success of LLVM, by allowing the community to improve accelerator-centric compiler support over time, particularly as more hardware variants of VTA emerge.
+The extendability of the compiler stack, combined with the ability to modify the architecture and the programming interface of the hardware back-end should lead to exciting opportunities in hardware-software co-design for deep learning.</p>
+
+<h3 id="deep-learning-researchers">Deep Learning Researchers</h3>
+
+<p>A transparent and customizable software and hardware stack empowers deep learning researchers to come up with novel neural network operators, and data representations, all the while enabling the complete evaluation of those optimizations on end-to-end systems. Techniques like binarization are currently limited to CPU and GPU evaluations, unless significant engineering resources are dedicated to produce an FPGA or ASIC design that can evaluate the technique’s full energy savings potent [...]
+
+<h2 id="technical-details">Technical Details</h2>
+
+<h3 id="stack-overview">Stack Overview</h3>
+
+<p>The VTA deep learning accelerator and TVM stack can bridge the gap between productivity-oriented deep learning frameworks, and performance-focused hardware substrates, such as FPGAs.</p>
+<ul>
+  <li>NNVM, the graph-level optimizer, provides a graph-level Intermediate Representation (IR) used as a common language between different deep learning frameworks to take advantage of graph-level optimizations, such as operator fusion. The NNVM IR is also used to specify data layout and data format constraints: e.g. tiling for tensorization, and bit-packing for ultra-low precision computing.</li>
+  <li>TVM, the tensor-level optimizer, builds upon the Halide DSL and schedule primitives to provide an optimizing compiler capable of bringing performance portability for deep learning across hardware back-ends. TVM brings novel scheduling primitives that target specialized hardware accelerators, such as tensorization, which lowers computation onto specialized tensor-tensor hardware instructions. In addition, it provides schedule primitives and lowering rules that allow for explicit mem [...]
+  <li>The VTA runtime performs JIT compilation of VTA binaries (instruction streams and micro-kernel code), manages shared memory, and performs synchronization to hand off execution to VTA. The VTA runtime presents an API that looks generic to TVM, to hide complexities of platform-specific bookkeeping tasks. It exposes a C++ API that a TVM module can call into - this simplifies the future inclusion of other hardware accelerator designs, without having to drastically modify the upper TVM  [...]
+  <li>VTA’s two-level ISA provides both (1) a high-level CISC ISA that describes variable latency operations such as DMA loads, or deep learning operators and (2) a low-level, and fixed latency RISC ISA that describe low-level matrix-matrix operations. This two-level ISA allows both code compactness, and expressiveness.</li>
+  <li>Finally, VTA’s micro-architecture provides a flexible deep learning hardware design specification, that can be conveniently compiled onto other FPGA platforms, and eventually in the long term down to ASICs.</li>
+</ul>
+
+<h3 id="vta-hardware-design-overview">VTA Hardware Design Overview</h3>
+
+<p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning accelerator built around a GEMM core, which performs dense matrix multiplication at a high computational throughput.
+The design is inspired by mainstream deep learning accelerators, of the likes of Google’s TPU accelerator. The design adopts decoupled access-execute to hide memory access latency and maximize utilization of compute resources. To a broader extent, VTA can serve as a template deep learning accelerator design, exposing a clean tensor computation abstraction to the compiler stack.</p>
+
+<p style="text-align: center"><img src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png" alt="image" width="60%" /></p>
+
+<p>The figure above presents a high-level overview of the VTA hardware organization. VTA is composed of four modules that communicate between each other via FIFO queues and single-writer/single-reader SRAM memory blocks, to allow for task-level pipeline parallelism.
+The compute module performs both dense linear algebra computation with its GEMM core, and general computation with its tensor ALU.
+It operates on a register file which instead of storing scalar values, stores tensors of rank 1 or 2.
+The micro-op cache stores low-level code that dictates a sequence of operations to mutate the register file.</p>
+
+<p>The VTA hardware design template offers modularity to the user, with the option to modify hardware datatypes, memory architecture, the GEMM core dimensions, hardware operators, and pipelining stages.
+Exposing multiple variants of VTA to the compiler stack facilitates the developments of compilers, since we can test TVM’s ability to target an multitude of hardware accelerators, rather than a single design.</p>
+
+<h3 id="vta-prototyping-with-vta-simulator-and-pynq-fpga-board">VTA Prototyping with VTA Simulator and Pynq FPGA Board</h3>
+
+<p>The VTA release allows users to experiment with hardware acceleration, and accelerator-centric compiler optimizations in two ways.
+The first approach, which doesn’t require special hardware is to run deep learning workloads on a behavioral simulator of the VTA design.
+This simulator back-end is readily available for developers to experiment with.
+The second approach relies on an off-the-shelf and low-cost FPGA development board – the <a href="http://www.pynq.io/">Pynq board</a>, which exposes a reconfigurable FPGA fabric and an ARM SoC.</p>
+
+<p style="text-align: center"><img src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png" alt="image" width="70%" /></p>
+
+<p>The VTA release offers a simple compilation and deployment flow of the VTA hardware design and TVM workloads on the Pynq platform, with the help of an RPC server interface.
+The RPC server handles FPGA reconfiguration tasks and TVM module invocation offloading onto the VTA runtime.
+The VTA runtime system runs on the ARM CPU of the Pynq embedded system, and generates VTA binaries on the fly to offload to the FPGA hardware.
+This complete solution allows for out-of-the-box prototyping on low-cost FPGAs, with an interactive and familiar Python environment, hiding much of the complexity and headaches of FPGA design away from the user.</p>
+
+<p>For programmers familiar with hardware and FPGAs, we expose the VTA design expressed in HLS C, and provide scripts built on top of the Xilinx toolchains to compile the design into an FPGA bitstream.
+We are currently building a repository of VTA variants, so that users can explore different design variants for their deep learning workloads without having to go through the time consuming FPGA compilation process.</p>
+
+<h2 id="performance-assessment">Performance Assessment</h2>
+
+<p><em>VTA is at its early stages of development and we expect more performance improvements and optimizations to come.
+As of now we offer end-to-end performance evaluations on the low-cost Pynq board which incorporates a dated 28nm FPGA fabric.
+While this platform is meant for prototyping (the 2012 FPGA cannot compete with modern ASICs), we are porting VTA to newer high-performance FPGA platforms that will offer more competitive performance.</em></p>
+
+<p><em>We are working on more experiments and will release new results as they are obtained.</em></p>
+
+<h3 id="resource-utilization-on-resnet-18">Resource Utilization on ResNet-18</h3>
+
+<p>A popular method used to assess the efficient use of hardware are roofline diagrams: given a hardware design, how efficiently are different workloads utilizing the hardware compute and memory resources. The roofline plot below shows the throughput achieved on different convolution layers of the ResNet-18 inference benchmark. Each layer has a different arithmetic intensity, i.e. compute to data movement ratio.
+In the left half, convolution layers are bandwidth limited, whereas on the right half, they are compute limited.</p>
+
+<p style="text-align: center"><img src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png" alt="image" width="60%" /></p>
+
+<p>The goal behind designing a hardware architecture, and a compiler stack is to bring each workload as close as possible to the roofline of the target hardware.
+The roofline plot shows the effects of having the hardware and compiler work together to maximize utilization of the available hardware resources.
+The technique showcased is latency hiding, which requires explicit dependence tracking at the hardware level, compiler support to partition work, explicit dependence insertion in the instruction stream during code-generation.
+The result is an overall higher utilization of the available compute and memory resources.</p>
+
+<h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18 evaluation</h3>
+
+<p style="text-align: center"><img src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png" alt="image" width="60%" /></p>
+
+<p>A benefit of having a complete compiler stack built for VTA is the ability to run end-to-end workloads. This is compelling in the context of hardware acceleration because we need to understand what performance bottlenecks, and Amdahl limitations stand in the way to obtaining faster performance.
+The bar plot above shows inference performance with and without offloading the ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s ARM Cortex A9 SoC.
+At a glance, it iss clear that VTA accomplishing its goal, reducing the time it takes to perform convolutions on the CPU (dark blue).
+However, it becomes apparent that other operators need offloading, as they now constitute a new bottleneck.
+This kind of high-level visibility is essential to system designers who want to understand how systems affect end-to-end performance.</p>
+
+<h2 id="open-source-effort">Open Source Effort</h2>
+<p>VTA is research effort at the Paul G. Allen School Computer Science and Engineering at University of Washington, and is now integrated into the TVM stack. The TVM project follows the Apache open-source model, to create a community maintained project. You are more than welcome to join us and lead the effort.</p>
+
+<h2 id="acknowledgements">Acknowledgements</h2>
+<p>VTA is a research project that came out of the SAML group, which is generously supported by grants from DARPA and the National Science Foundation and gifts from Huawei, Oracle, Intel and anonymous donors.</p>
+
+<h2 id="get-started">Get Started!</h2>
+<ul>
+  <li>TVM and VTA Github page can be found here: <a href="https://github.com/dmlc/tvm">https://github.com/dmlc/tvm</a>.</li>
+  <li>You can get started with easy to follow <a href="https://docs.tvm.ai/vta/tutorials/index.html">tutorials on programming VTA with TVM</a>.</li>
+  <li>For more technical details on VTA, read our <a href="https://arxiv.org/abs/1807.04188">VTA technical report</a> on ArXiv.</li>
+</ul>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/08/10/DLPack-Bridge.html b/2018/08/10/DLPack-Bridge.html
new file mode 100644
index 0000000..999e84c
--- /dev/null
+++ b/2018/08/10/DLPack-Bridge.html
@@ -0,0 +1,295 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Building a Cross-Framework Deep Learning Compiler via DLPack</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Building a Cross-Framework Deep Learning Compiler via DLPack </h1>
+      <p class="post-meta">
+        <time datetime="2018-08-10T00:00:00-07:00" itemprop="datePublished">
+          Aug 10, 2018
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Eddie Yan</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>Deep learning frameworks such as Tensorflow, PyTorch, and ApacheMxNet provide a
+powerful toolbox for quickly prototyping and deploying deep learning models.
+Unfortunately, their ease-of-use has often come at the cost of fragmentation: it
+is only easy to use each framework in isolation. Vertical integration has made
+development streamlined for common use cases, but venturing off of the beaten
+path can be tricky.</p>
+
+<p>One scenario that is poorly supported is passing tensors
+<em>directly</em> from one framework to another in memory, without any data duplication
+or copies. Supporting such a use case would enable users to string together
+pipelines where certain operators are better supported in one framework (or
+faster) than another efficiently. A shared data representation between
+frameworks would also bridge this gap, and allow compiler stacks to target a
+single format when generating code for operators.</p>
+
+<p><a href="https://github.com/dmlc/dlpack">DLPack</a> is an intermediate in-memory
+representation standard for tensor data structures. With DLPack as a common
+representation, we can leverage TVM in scripts written for frameworks that
+traditionally could only rely on vendor-provided libraries. TVM packed functions
+can operate on DLPack tensors, providing wrappers bridging tensor data
+structures from frameworks such as PyTorch and MxNet <em>with zero-data-copy</em>.</p>
+
+<p>DLPack presents a simple, portable in-memory data structure:</p>
+<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*!
+ * \brief Plain C Tensor object, does not manage memory.
+ */</span>
+<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
+  <span class="cm">/*!
+   * \brief The opaque data pointer points to the allocated data.
+   *  This will be CUDA device pointer or cl_mem handle in OpenCL.
+   *  This pointer is always aligns to 256 bytes as in CUDA.
+   */</span>
+  <span class="kt">void</span><span class="o">*</span> <span class="n">data</span><span class="p">;</span>
+  <span class="cm">/*! \brief The device context of the tensor */</span>
+  <span class="n">DLContext</span> <span class="n">ctx</span><span class="p">;</span>
+  <span class="cm">/*! \brief Number of dimensions */</span>
+  <span class="kt">int</span> <span class="n">ndim</span><span class="p">;</span>
+  <span class="cm">/*! \brief The data type of the pointer*/</span>
+  <span class="n">DLDataType</span> <span class="n">dtype</span><span class="p">;</span>
+  <span class="cm">/*! \brief The shape of the tensor */</span>
+  <span class="kt">int64_t</span><span class="o">*</span> <span class="n">shape</span><span class="p">;</span>
+  <span class="cm">/*!
+   * \brief strides of the tensor,
+   *  can be NULL, indicating tensor is compact.
+   */</span>
+  <span class="kt">int64_t</span><span class="o">*</span> <span class="n">strides</span><span class="p">;</span>
+  <span class="cm">/*! \brief The offset in bytes to the beginning pointer to data */</span>
+  <span class="kt">uint64_t</span> <span class="n">byte_offset</span><span class="p">;</span>
+<span class="p">}</span> <span class="n">DLTensor</span><span class="p">;</span>
+</code></pre></div></div>
+
+<p>As an example, we declare and compile a matrix multiplication operator in TVM,
+and build a wrapper that uses the DLPack representation to allow this operator
+to support PyTorch tensors. We also repeat this demonstration with MxNet. This
+extension allows machine learning developers to quickly port research code to
+relatively unsupported hardware platforms without sacrificing performance.</p>
+
+<p>Illustration of how DLPack provides an intermediate wrapper that is shared
+between frameworks and TVM:</p>
+<p style="text-align: center"><img src="/images/pytorch-dlpack/dlpack.png" alt="image" width="65%" /><br />
+Figure 1</p>
+
+<p>First, we compute a reference output in PyTorch:</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kn">import</span> <span class="nn">torch</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
+    <span class="n">y</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
+    <span class="n">z</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">mm</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>We then define and build a TVM matrix multiplication operator, using the default
+schedule:</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">n</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="mi">56</span><span class="p">)</span>
+    <span class="n">X</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'X'</span><span class="p">)</span>
+    <span class="n">Y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'Y'</span><span class="p">)</span>
+
+    <span class="n">k</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'k'</span><span class="p">)</span>
+    <span class="n">Z</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">j</span> <span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span  [...]
+    <span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">Z</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
+    <span class="n">fmm</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">Z</span><span class="p">],</span> <span class="n">target_host</span><span class="o">=</span><span class="s">'llvm'</span><span class="p">,</span> < [...]
+</code></pre></div></div>
+<p>For brevity, we do not cover TVM’s large collection of scheduling primitives
+that we can use to optimize matrix multiplication. If you wish to make a custom
+GEMM operator run <em>fast</em> on your hardware device, a detailed tutorial can be
+found <a href="https://docs.tvm.ai/tutorials/optimize/opt_gemm.html">here</a>.</p>
+
+<p>We then convert the TVM function into one that supports PyTorch tensors:</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kn">from</span> <span class="nn">tvm.contrib.dlpack</span> <span class="kn">import</span> <span class="n">to_pytorch_func</span>
+    <span class="c1"># fmm is the previously built TVM function (Python function)
+</span>    <span class="c1"># fmm is the wrapped TVM function (Python function)
+</span>    <span class="n">fmm_pytorch</span> <span class="o">=</span> <span class="n">to_pytorch_func</span><span class="p">(</span><span class="n">fmm</span><span class="p">)</span>
+    <span class="n">z2</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
+    <span class="n">fmm_pytorch</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z2</span><span class="p">)</span>
+    <span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">z</span><span class="o">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">z2</span><span class="o">.</span><span class="n">numpy</span><span class="p">())</span>
+</code></pre></div></div>
+<p>and verify that the results match.</p>
+
+<p>We can repeat the same example, but using MxNet instead:</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kn">import</span> <span class="nn">mxnet</span>
+    <span class="kn">from</span> <span class="nn">tvm.contrib.mxnet</span> <span class="kn">import</span> <span class="n">to_mxnet_func</span>
+    <span class="n">ctx</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">cpu</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+    <span class="n">y</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+    <span class="n">z</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+    <span class="n">f</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">Z</span><span class="p">],</span> <span class="n">target_host</span><span class="o">=</span><span class="s">'llvm'</span><span class="p">,</span> <sp [...]
+    <span class="n">f_mxnet</span> <span class="o">=</span> <span class="n">to_mxnet_func</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
+    <span class="n">f_mxnet</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">)</span>
+    <span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">z</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">(),</span> <span class="n">x</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">()</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">y</span><span class="o">. [...]
+</code></pre></div></div>
+
+<h2 id="under-the-hood-of-the-pytorch-example">Under the hood of the PyTorch Example</h2>
+<p>As TVM provides <a href="https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455">functions</a> to convert dlpack tensors to tvm <code class="highlighter-rouge">NDArray</code>s and
+vice-versa, so all that is needed is some syntactic sugar by wrapping functions.
+<code class="highlighter-rouge">convert_func</code> is a generic converter for frameworks using tensors with dlpack
+support, and can be used to implement convenient converters, such as
+<code class="highlighter-rouge">to_pytorch_func</code>.</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">convert_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">,</span> <span class="n">tensor_type</span><span class="p">,</span> <span class="n">to_dlpack_func</span><span class="p">):</span>
+    <span class="k">assert</span> <span class="nb">callable</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">)</span>
+
+    <span class="k">def</span> <span class="nf">_wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
+        <span class="n">args</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">ndarray</span><span class="o">.</span><span class="n">from_dlpack</span><span class="p">(</span><span class="n">to_dlpack_func</span><span class="p">(</span><span class="n">arg</span><span class="p">))</span>\
+            <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">tensor_type</span><span class="p">)</span> <span class="k">else</span> <span class="n">arg</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">)</span>
+        <span class="k">return</span> <span class="n">tvm_func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
+
+    <span class="k">return</span> <span class="n">_wrapper</span>
+
+<span class="k">def</span> <span class="nf">to_pytorch_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">):</span>
+    <span class="kn">import</span> <span class="nn">torch</span>
+    <span class="kn">import</span> <span class="nn">torch.utils.dlpack</span>
+    <span class="k">return</span> <span class="n">convert_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">dlpack</span><span class="o">.</span><span class="n">to_dlpack</span><span class="p">)</span>
+</code></pre></div></div>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/10/03/auto-opt-all.html b/2018/10/03/auto-opt-all.html
new file mode 100644
index 0000000..233f02c
--- /dev/null
+++ b/2018/10/03/auto-opt-all.html
@@ -0,0 +1,550 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Automatic Kernel Optimization for Deep Learning on All Hardware Platforms</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Automatic Kernel Optimization for Deep Learning on All Hardware Platforms </h1>
+      <p class="post-meta">
+        <time datetime="2018-10-03T00:00:00-07:00" itemprop="datePublished">
+          Oct 3, 2018
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Lianmin Zheng, Eddie Yan, Tianqi Chen</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>Optimizing the performance of deep neural network on a diverse range of hardware platforms is still a hard
+problem for AI developers. In terms of system support, we are facing a many-to-many problem here:
+deploying trained models from multiple frontends (e.g. Tensorflow, ONNX, MXNet) to multiple
+hardware platforms (e.g. CPU, GPU, Accelerators). The most performance critical part of
+this problem is obtaining high performance kernel implementations for growing model
+architectures and hardware platforms.</p>
+
+<p>To address this challenge, TVM takes a full stack compiler approach.
+TVM combines code generation and automatic program optimization to generate kernels
+that are comparable to heavily hand-optimized libraries,
+obtaining state-of-the-art inference performance on hardware platforms including
+ARM CPUs, Intel CPUs, Mali GPUs, NVIIDA GPUs and AMD GPUs.</p>
+
+<p>In this blog post, we show the workflow of automatic kernel optimization in TVM compiler stack and 
+benchmark results on several hardware platforms.</p>
+
+<h1 id="system-overview">System Overview</h1>
+
+<p style="text-align: center"><img src="/images/autotune-all/overview.png" alt="image" width="35%" /></p>
+<center> Figure 1. System Overview </center>
+<p></p>
+
+<p>Kernel optimization in TVM is done in an iterative loop fashion.
+As shown in Figure 1, the automatic kernel optimization takes a neural network (typically in computational graph representation)
+from frontend frameworks as input, and generates kernels for all operators in this network.</p>
+
+<p>The inner loop uses a scalable RPC runtime, machine learning based tuners and a tensor compiler.
+In each round of the loop, the tuner picks a batch of promising candidate kernel implementations from a large search space,
+and profile them on real hardware. Then the tuner gets the profiling results. These profiling results are used as training 
+data to fit a prediction model. After fitting the prediction model, the tuner picks the next promising candidates according to the predictions,
+and the loop continues. This way, we search for fast kernels iteratively.</p>
+
+<p>The below figure compares traditional auto-tuning and AutoTVM. 
+The major difference is that AutoTVM is</p>
+<ul>
+  <li><strong>Scalable</strong> to heterogenous cluster of devices</li>
+  <li><strong>Learning</strong> to optimize tensor programs with a transferable machine learning cost model</li>
+</ul>
+
+<p>You can refer to our paper[1] for more details.</p>
+
+<p style="text-align: center"><img src="/images/autotune-all/autotvm.png" alt="image" width="50%" /></p>
+<center> Figure 2. Comparison of Traditional Auto-tuning and AutoTVM </center>
+<p></p>
+
+<h2 id="begin-tuning">Begin Tuning</h2>
+<p>For demonstration, we run our optimization for resnet-18 on RK3399, an ARM development board.
+The detailed instructions are omitted due to the space limit of a blog post.
+Links to tutorials for ARM CPU, Mali GPU, NVIDIA GPU, AMD GPU are all available at the end of this blog.</p>
+
+<p>First we get a pre-trained model from MXNet model zoo, and extract tuning tasks from it.</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">mxnet.gluon.model_zoo.vision</span> <span class="kn">import</span> <span class="n">get_model</span>
+
+<span class="n">block</span> <span class="o">=</span> <span class="n">get_model</span><span class="p">(</span><span class="s">'resnet18_v1'</span><span class="p">,</span> <span class="n">pretrained</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
+<span class="n">net</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">nnvm</span><span class="o">.</span><span class="n">frontend</span><span class="o">.</span><span class="n">from_mxnet</span><span class="p">(</span><span class="n">block</span><span class="p">)</span>
+
+<span class="n">tasks</span> <span class="o">=</span> <span class="n">autotvm</span><span class="o">.</span><span class="n">extract_from_graph</span><span class="p">(</span><span class="n">net</span><span class="p">)</span>
+<span class="n">tune_tasks</span><span class="p">(</span><span class="n">tasks</span><span class="p">,</span> <span class="o">**</span><span class="n">tuning_option</span><span class="p">)</span>
+</code></pre></div></div>
+<p>There are 12 different conv2d layers in resnet-18, so we launch 12 tuning tasks.
+For each of them, the tuner makes several hundreds of trials and picks the best one.
+After finishing all tuning tasks, we compile the whole network and generate a single deployable minimal library.
+One sample output is</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Extract tasks...
+Tuning...
+[Task  1/12]  Current/Best:   22.37/  52.19 GFLOPS | Progress: (544/1000) | 406.59 s Done.
+[Task  2/12]  Current/Best:    6.51/  18.77 GFLOPS | Progress: (608/1000) | 325.05 s Done.
+[Task  3/12]  Current/Best:    4.67/  24.87 GFLOPS | Progress: (480/1000) | 372.31 s Done.
+[Task  4/12]  Current/Best:   11.35/  46.83 GFLOPS | Progress: (736/1000) | 602.39 s Done.
+[Task  5/12]  Current/Best:    1.01/  19.80 GFLOPS | Progress: (448/1000) | 262.16 s Done.
+[Task  6/12]  Current/Best:    2.47/  23.76 GFLOPS | Progress: (672/1000) | 563.85 s Done.
+[Task  7/12]  Current/Best:   14.57/  33.97 GFLOPS | Progress: (544/1000) | 465.15 s Done.
+[Task  8/12]  Current/Best:    1.13/  17.65 GFLOPS | Progress: (576/1000) | 365.08 s Done.
+[Task  9/12]  Current/Best:   14.45/  22.66 GFLOPS | Progress: (928/1000) | 724.25 s Done.
+[Task 10/12]  Current/Best:    3.22/  15.36 GFLOPS | Progress: (864/1000) | 564.27 s Done.
+[Task 11/12]  Current/Best:   11.03/  32.23 GFLOPS | Progress: (736/1000) | 635.15 s Done.
+[Task 12/12]  Current/Best:    8.00/  21.65 GFLOPS | Progress: (1000/1000) | 1111.81 s Done.
+Compile...
+Upload...
+Evaluate inference time cost...
+Mean inference time (std dev): 162.59 ms (0.06 ms)
+</code></pre></div></div>
+
+<p>The tuning is especially helpful and worth a try if your model has some strange shapes or
+your hardware is customized, as hand-optimized static libraries cannot consider all situations.</p>
+
+<h1 id="benchmark-results">Benchmark Results</h1>
+<p>We pre-tuned some popular networks on our device cluster and released the following benchmark.
+Instructions for reproduction are at the end of this blog.</p>
+
+<p>Comprehensively benchmarking TVM is easy since we have a unified runtime interface.
+However maintaining complete, up-to-date, and correct comparisons against all other platforms is not feasible
+without expert assistance from the developers of many other projects.
+So we put all our numbers in a table, and then provide an incomplete comparison with some other libraries.</p>
+
+<h2 id="comparison">Comparison</h2>
+<p>We validate the effectiveness of our automatic optimization stack by 
+comparing with heavily optimized traditional libraries on each platform.</p>
+
+<p>We tested popular image classification networks on ImageNet (3x224x224) dataset with batch size = 1 and data type = float32.
+The reported numbers are time costs per image in milliseconds.</p>
+
+<h3 id="arm-cpu">ARM CPU</h3>
+
+<p>We choose <a href="https://github.com/Tencent/ncnn">NCNN</a>, a widely used, hand-optimized kernel library as baseline.
+It makes extensive use of NEON assembly instructions. For example, the code base contains
+<a href="https://github.com/Tencent/ncnn/blob/master/src/layer/arm/convolution_3x3.h">13k lines of code</a> for only 3x3 convolution layers.
+We reference the benchmark numbers in their project repository.
+As shown in the figure below, TVM outperforms it for all networks on Rasbperry Pi 3B.</p>
+
+<p><img src="/images/autotune-all/arm.png" alt="image" width="90%" /></p>
+
+<h3 id="mali-gpu">Mali GPU</h3>
+
+<p><a href="https://github.com/ARM-software/ComputeLibrary">ARM Compute Library</a> is a vendor provided library that supports Mali GPU (OpenCL) well.
+According to the results, TVM provides stronger performance in ResNet and MobileNet due to advantages in convolutional layers.
+TVM lags behind a bit on vgg-16 because vgg-16 is an old and huge network and has several large dense layers.</p>
+
+<p><img src="/images/autotune-all/mali.png" alt="image" width="90%" /></p>
+
+<h3 id="nvidia-gpu">NVIDIA GPU</h3>
+
+<p>On NVIDIA GPU, <a href="https://developer.nvidia.com/cudnn">CuDNN</a> and <a href="https://developer.nvidia.com/tensorrt">TensorRT</a> are two vendor-provided libraries for training and inference respectively. Since we focus on inference,
+we run our benchmark in the unbatched setting. Another tensor compiler <a href="https://github.com/plaidml/plaidml">PlaidML</a> is also reported as baseline
+as there is a previous benchmark of it compared against a pre-AutoTVM version of TVM.
+We reference its benchmark results from <a href="https://github.com/plaidml/plaidbench">PlaidBench</a>.
+According to the results below, TVM achieves parity with TensorRT performance.</p>
+
+<p><img src="/images/autotune-all/nvidia.png" alt="image" width="90%" /></p>
+
+<h3 id="amd-gpu">AMD GPU</h3>
+
+<p>We also take a quick look at a AMD GPU. TVM supports OpenCL and <a href="https://rocm.github.io/">ROCm</a> backend. We found ROCm is better since
+it is more specialized for AMD GPUs.
+<a href="https://github.com/ROCmSoftwarePlatform/MIOpen">MIOpen</a> is a vendor provided
+kernel library. TVM’s graph runtime can call MIOpen’s kernel implementations directly, so we report
+the baseline performance by using this integration.</p>
+
+<p>We didn’t do any specific optimization for AMD GPU. All computation definition and schedule code for NVIDIA GPU is directly reused.
+As a result, TVM is a little bit slower then MIOpen in most cases.
+We believe there is still room for improvement.</p>
+
+<p><img src="/images/autotune-all/amd.png" alt="image" width="90%" /></p>
+
+<h2 id="all-our-results">All Our Results</h2>
+<p>We tested the following networks on ImageNet (3x224x224) dataset with batch size = 1 and data type = float32.
+The reported numbers are time costs per image in milliseconds.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+      <th> </th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td> </td>
+      <td><strong>densenet121</strong></td>
+      <td><strong>inception v3</strong></td>
+      <td><strong>mobilenet</strong></td>
+      <td><strong>mobilenet v2</strong></td>
+      <td><strong>resnet18</strong></td>
+      <td><strong>resnet50</strong></td>
+      <td><strong>squeezenet v1.0</strong></td>
+      <td><strong>squeezenet v1.1</strong></td>
+      <td><strong>vgg16</strong></td>
+      <td><strong>vgg19</strong></td>
+    </tr>
+    <tr>
+      <td><strong>ARM CPU</strong></td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+    </tr>
+    <tr>
+      <td>Huawei P20 Pro</td>
+      <td>181.4</td>
+      <td>439.9</td>
+      <td>41.1</td>
+      <td>34.5</td>
+      <td>76.5</td>
+      <td>208.2</td>
+      <td>51.8</td>
+      <td>25.7</td>
+      <td>480.6</td>
+      <td>627.0</td>
+    </tr>
+    <tr>
+      <td>Google Pixel2</td>
+      <td>162.2</td>
+      <td>433.5</td>
+      <td>39.5</td>
+      <td>30.1</td>
+      <td>61.1</td>
+      <td>181.3</td>
+      <td>47.3</td>
+      <td>23.2</td>
+      <td>391.1</td>
+      <td>487.7</td>
+    </tr>
+    <tr>
+      <td>Firefly RK3399</td>
+      <td>335.9</td>
+      <td>1285.9</td>
+      <td>78.6</td>
+      <td>66.7</td>
+      <td>161.2</td>
+      <td>403.8</td>
+      <td>94.6</td>
+      <td>48.5</td>
+      <td>902.9</td>
+      <td>1090.1</td>
+    </tr>
+    <tr>
+      <td>Raspberry Pi 3B</td>
+      <td>609.5</td>
+      <td>2070.4</td>
+      <td>122.2</td>
+      <td>103.7</td>
+      <td>322.5</td>
+      <td>725.8</td>
+      <td>185.1</td>
+      <td>94.1</td>
+      <td>1759.6</td>
+      <td>2118.6</td>
+    </tr>
+    <tr>
+      <td>Xilinx PYNQ</td>
+      <td>2888.3</td>
+      <td>9709.1</td>
+      <td>723.5</td>
+      <td>514.3</td>
+      <td>1234.6</td>
+      <td>3580.5</td>
+      <td>909.9</td>
+      <td>477.3</td>
+      <td><sup>-(Note 1)</sup></td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td><strong>Mali GPU</strong></td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+    </tr>
+    <tr>
+      <td>Mali-T860</td>
+      <td>410.9</td>
+      <td>783.1</td>
+      <td>75.4</td>
+      <td>70.8</td>
+      <td>128.6</td>
+      <td>352.9</td>
+      <td>106.2</td>
+      <td>58.0</td>
+      <td>679.5</td>
+      <td>805.3</td>
+    </tr>
+    <tr>
+      <td><strong>NVIDIA GPU</strong></td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+    </tr>
+    <tr>
+      <td>GTX 1080 Ti</td>
+      <td>3.6</td>
+      <td>5.8</td>
+      <td>0.6</td>
+      <td>- <sup>(Note 2) </sup></td>
+      <td>-</td>
+      <td>2.7</td>
+      <td>-</td>
+      <td>-</td>
+      <td>4.0</td>
+      <td>4.6</td>
+    </tr>
+    <tr>
+      <td>GTX TITAN X</td>
+      <td>5.8</td>
+      <td>9.7</td>
+      <td>1.0</td>
+      <td>-</td>
+      <td>-</td>
+      <td>4.3</td>
+      <td>-</td>
+      <td>-</td>
+      <td>6.4</td>
+      <td>7.5</td>
+    </tr>
+    <tr>
+      <td><strong>AMD GPU</strong></td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+    </tr>
+    <tr>
+      <td>AMD Vega FE</td>
+      <td>5.7</td>
+      <td>8.8</td>
+      <td>1.0</td>
+      <td>-</td>
+      <td>-</td>
+      <td>4.5</td>
+      <td>-</td>
+      <td>-</td>
+      <td>5.9</td>
+      <td>7.0</td>
+    </tr>
+    <tr>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+      <td> </td>
+    </tr>
+  </tbody>
+</table>
+
+<ul>
+  <li>Note 1: Out of memory on this board.</li>
+  <li>Note 2: We didn’t tune some small networks on GPU due to time constraints.
+When profiling data is not available, TVM can use fallback code generation. 
+But competitive performance is not guaranteed in this scenario.</li>
+</ul>
+
+<h1 id="conclusion">Conclusion</h1>
+<p>With an expressive code generator and an efficient search algorithm, we are able to
+generate kernels that are comparable to heavily hand-optimized ones.
+Since programmer time is expensive and machine time is getting cheaper,
+we believe automatic optimization with real hardware and data in the loop will be the standard workflow
+for inference deployment. TVM just provides such a solution.</p>
+
+<h2 id="links">Links</h2>
+<p>[1] benchmark: <a href="https://github.com/dmlc/tvm/tree/master/apps/benchmark">https://github.com/dmlc/tvm/tree/master/apps/benchmark</a><br />
+[2] Tutorial on tuning for ARM CPU: <a href="https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html">https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html</a><br />
+[3] Tutorial on tuning for Mobile GPU: <a href="https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_mobile_gpu.html">https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_mobile_gpu.html</a><br />
+[4] Tutorial on tuning for NVIDIA/AMD GPU: <a href="https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_cuda.html">https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_cuda.html</a><br />
+[5] Paper about AutoTVM: <a href="https://arxiv.org/abs/1805.08166">Learning to Optimize Tensor Program</a><br />
+[6] Paper about Intel CPU (by AWS contributors) :  <a href="https://arxiv.org/abs/1809.02697">Optimizing CNN Model Inference on CPUs</a></p>
+
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/10/09/ml-in-tees.html b/2018/10/09/ml-in-tees.html
new file mode 100644
index 0000000..7c5dd7f
--- /dev/null
+++ b/2018/10/09/ml-in-tees.html
@@ -0,0 +1,272 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Efficient Privacy-Preserving ML Using TVM</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Efficient Privacy-Preserving ML Using TVM </h1>
+      <p class="post-meta">
+        <time datetime="2018-10-09T00:00:00-07:00" itemprop="datePublished">
+          Oct 9, 2018
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Nick Hynes</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>This post describes Myelin, a framework for privacy-preserving machine learning in trusted hardware enclaves, and how TVM makes Myelin fast.
+The key idea is that TVM, unlike other popular ML frameworks, compiles models into lightweight, optimized, and dependency-free libraries which can fit into resource constrained enclaves.</p>
+
+<p style="text-align: center"><img src="/images/sgx/tvmfits.png" alt="TVM fits in enclaves" style="width: 80vw; max-width: 600px" /></p>
+
+<p>Give creating a privacy-preserving ML model a try! Check out the <a href="https://github.com/dmlc/tvm/tree/master/apps/sgx">example code</a> available in the TVM repo.</p>
+
+<h1 id="motivation-privacy-preserving-ml">Motivation: Privacy-Preserving ML</h1>
+
+<p>Machine learning models benefit from large and diverse datasets.
+Unfortunately, using such datasets often requires trusting a centralized data aggregator or computation provider.
+For sensitive applications like healthcare and finance this is undesirable as it could compromise patient privacy or divulge trade secrets.
+Recent advances in secure and privacy-preserving computation, including <em>trusted execution environments</em> and <em>differential privacy</em>, offer a way for mutually distrusting parties to efficiently train a machine learning model without compromising the training data.
+We use TVM to make privacy-preserving ML framework fast.</p>
+
+<p style="text-align: center"><img src="/images/sgx/sgx.png" alt="Myelin workflow" style="width: 80vw; max-width: 600px" /></p>
+
+<h2 id="use-cases">Use Cases</h2>
+
+<ul>
+  <li><strong>private MLaaS</strong>: a cloud provider runs their architecture on your data. You get the model outputs, your data stays private, and the cloud provider knows that you can’t steal model.</li>
+  <li><strong>trustworthy ML competitions</strong>: you train a model on contest data. The contest organizer sends private test data to your model and gets verifiable report of accuracy. Your model stays safe until the organizer decides to purchase it. Other participants can‘t cheat by training on test data.</li>
+  <li><strong>training on shared private data</strong>: you (a researcher) want to train a model on several hospitals’ data. Directly sharing is too complicated. Instead, have a “trusted third party” train a privacy-preserving model.</li>
+  <li><a href="http://www.vldb.org/pvldb/vol11/p2086-hynes.pdf"><strong>ML on the Blockchain</strong></a></li>
+</ul>
+
+<h1 id="background">Background</h1>
+
+<p style="text-align: center"><img src="/images/sgx/dpnn.png" alt="sketch of DP deep learning in a TEE" style="width: 80vw; max-width: 400px" /></p>
+
+<h2 id="trusted-execution-environments">Trusted Execution Environments</h2>
+
+<p>A <a href="https://en.wikipedia.org/wiki/Trusted_Computing#Remote_attestation">trusted execution environment</a> (TEE) essentially allows a remote user to provably run code on another person’s machine without revealing the computation to the hardware provider.</p>
+
+<p>More technically, the TEE provides a secure <em>enclave</em> of isolated/encrypted memory and CPU registers; also a trusted source of randomness.
+The TEE can also send a signed attestation of the code that’s loaded so that the remote user can verify that the enclave has been correctly loaded.
+This process, known as remote attestation, can be used to establish a secure communication channel into the enclave .
+The remote user can then provision it with secrets like private keys, model parameters, and training data.</p>
+
+<p>Compared to pure crypto methods like <a href="https://en.wikipedia.org/wiki/Garbled_circuit">secure multi-parity computation (MPC)</a> and <a href="https://en.wikipedia.org/wiki/Homomorphic_encryption#Fully_homomorphic_encryption">fully-homomorphic encryption (FHE)</a>, TEEs are several orders of magnitude faster and support general-purpose computation (i.e. not just arithmetic operations).
+Perhaps the only drawbacks are the additional trust assumptions in the hardware root of trust (a key burned into the processor) and loaded software.</p>
+
+<p>Trust assumptions notwithstanding, TEE technology is becoming increasingly widespread and is playing a major role in practical privacy-preservation.
+In fact, general-purpose TEEs already exist in commodity hardware like <a href="https://software.intel.com/en-us/sgx">Intel SGX</a> and <a href="https://genode.org/documentation/articles/trustzone">ARM TrustZone</a>.
+Additionally, the fully-open source <a href="https://keystone-enclave.org">Keystone enclave</a> is on the way.</p>
+
+<h2 id="differential-privacy">Differential Privacy</h2>
+
+<p style="text-align: center"><img src="/images/sgx/dp.png" alt="DP as false positive/negative" style="width: 80vw; max-width: 500px" /></p>
+
+<p><a href="https://en.wikipedia.org/wiki/Differential_Privacy#Randomized_Response">Differential privacy (DP)</a> provides a formal guarantee that models trained on similar datasets are indistinguishable
+Informally, a user’s privacy is not compromised by choosing to contribute data to a model.</p>
+
+<p>In other words, given the output of an algorithm on two datasets which differ in only a single record, differential privacy upper bounds the probability that an adversary can determine which dataset.
+An algorithm may be made DP using a mechanism which adds noise to the algorithm’s output.
+The amount of noise is calibrated on how much the output depends on any particular inputs.
+If you’re familiar with hypothesis testing, if outcomes A and B each have probability 0.5, applying a DP mechanism is like convolving with a probability distribution: the privacy is in the false positive and false negative rates.
+Since deep learning models tend to generalize well, the amount of noise is often less than might be expected.</p>
+
+<p>Running a DP training algorithm in a TEE ensures that the DP mechanism is faithfully applied.</p>
+
+<h1 id="efficient-privacy-preserving-ml-using-tvm">Efficient Privacy-Preserving ML Using TVM</h1>
+
+<p>One of the primary challenges of working with a TEE is that the code running within does not have access to the untrusted OS.
+This means that the trusted software cannot  create threads or perform I.O
+Practically speaking, the result is that numerical libraries like OpenBLAS–much less frameworks like PyTorch and TensorFlow–cannot run directly in enclaves.</p>
+
+<p>TEEs actually have a similar programming model to resource-constrained hardware accelerators.
+This is exactly what TVM is made for!
+In the privacy workflow, a user first defines an entire training graph in the high-level graph specification language.
+TVM them compiles the model and outputs a static library containing optimized numerical kernels which can easily be loaded into a TEE.
+Since the kernels are automatically generated and have strict bounds checking, they expose a low surface area of attack.
+They are supported by a lightweight memory-safe Rust runtime which also may easily be reviewed for safety and correctness.</p>
+
+<p>Of course, safety is most useful when practically applicable.
+Fortunately, TVM modules in enclaves have comparable performance to native CPU-based training.
+By coordinating threads using the untrusted runtime, a single TVM enclave can fully utilize the resources of its host machine.
+Moreover, it’s not difficult to imagine a secure parameter server which orchestrates entire datacenters of enclave-enabled machines.</p>
+
+<p>TVM also provides opportunities for more subtle optimization of privacy-preserving algorithms.
+Indeed, its fine-grained scheduling features allow speedups when using differential privacy.
+For instance, the tightest DP bounds may be obtained from clipping the gradients of each training example and adding noise to each [1].
+In autograd frameworks, this requires forwarding the model for each example in the minibatch (though only one backward pass is needed) [2].
+Using TVM, however, per-example gradient clipping is straightforward: instead of scheduling each weight update as a single reduction over both batch and feature dimensions, the reduction is split into two.
+The reduction over features is followed by clipping and noising, and then the final result is finally summed over examples to obtain the weight update.
+Thus, TVM allows applying differential privacy without introducing overhead greater than what is required by the technique.
+Also, if one really wants to get really fancy, it’s possible to fuse the clipping and noising operations and apply them in-place to further trim down latency and memory usage.</p>
+
+<p>For benchmarks on realistic workloads, please refer to the tech report <a href="https://arxiv.org/abs/1807.06689"><em>Efficient Deep Learning on Multi-Source Private Data</em></a>.
+And, of course, feel free go give the framework a spin in the <a href="https://github.com/dmlc/tvm/tree/master/apps/sgx">TVM SGX example</a>.</p>
+
+<h1 id="conclusion">Conclusion</h1>
+
+<p>The next generation of learning systems will be ushered in by privacy.
+As TEE technology becomes better understood and more widely available, it makes sense to leverage it as a resource for privacy-preserving machine learning and analytics.
+TVM is well poised to facilitate development of this use case in both research and deployment.</p>
+
+<h1 id="bio--acknowledgement">Bio &amp; Acknowledgement</h1>
+
+<p><a href="https://github.com/nhynes">Nick</a> is a PhD student in Prof. Dawn Song’s lab at UC Berkeley.
+His research interest is in the general domain of ML on shared private data, but this is really just an excuse to mess with Rust, security monitors, hardware enclaves, and compilers like TVM.</p>
+
+<p>Thanks to Tianqi Chen for the code reviews!</p>
+
+<h1 id="references">References</h1>
+
+<p>[1] <a href="https://arxiv.org/abs/1607.00133">Deep Learning with Differential Privacy</a><br />
+[2] <a href="https://arxiv.org/pdf/1510.01799v2.pdf">Efficient Per-Example Gradient Computations</a></p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2018/12/18/lowprecision-conv.html b/2018/12/18/lowprecision-conv.html
new file mode 100644
index 0000000..238d046
--- /dev/null
+++ b/2018/12/18/lowprecision-conv.html
@@ -0,0 +1,317 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Automating Generation of Low Precision Deep Learning Operators</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Automating Generation of Low Precision Deep Learning Operators </h1>
+      <p class="post-meta">
+        <time datetime="2018-12-18T00:00:00-08:00" itemprop="datePublished">
+          Dec 18, 2018
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Meghan Cowan</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>As deep learning models grow larger and more complex, deploying them on low powered phone and IoT
+devices becomes challenging because of their limited compute and energy budgets. A  recent  trend
+ in  deep  learning  is  the  use  of  extremely  quantized  models  that operate  on  inputs  and
+ weights  of  a  few  bits, with networks like XNOR-Net, DoReFa-Net, and HWGQ-Net making steady
+progress improving accuracy.</p>
+
+<p>An example of a low precision graph snippet is below. The low precision convolution takes in
+quantized data and bitpacks into the proper data layout for an efficient bitserial convolution.
+The output is in a higher precision and traditional deep learning layers such as batch normalization and ReLu are applied to it, before being re-quantized and sent through another low precision operator.</p>
+
+<p style="text-align: center"><img src="/images/low-precision/workflow.png" alt="image" width="50%" /></p>
+<center> Low precision convolution pipeline.</center>
+<p></p>
+
+<p>Theoretically,  low  precision operators use less operations than
+floating point operators, leading many to believe they can achieve up tremendous speedups.
+However, deep  learning frameworks  leverage  decades  of  engineering  work  through  low  level
+BLAS  and LAPACK libraries that are incredibly well optimized, and CPUs include intrinsic
+instructions to accelerate these tasks.  In  practice,  it  is  not  simple  to  develop low-level
+operators such as convolutions  that  are competitive  with  8-bit  quantized  or  even floating
+point operators.
+In  this  post  we  introduce  our  approach to automatically generating optimized
+low  precision  convolutions for  CPUs. We declare our low precision operators so that they compute
+on efficiently stored low precision inputs, and describe a schedule that describes a search space
+of implementation parameters. We rely on AutoTVM to quickly search the space and find optimized
+parameters for the particular convolution, precision, and backend.</p>
+
+<h2 id="bitserial-computation-background">Bitserial Computation Background</h2>
+
+<p>The  core  of  low  precision  models  is  the bitserial dot product that enables convolution and
+dense operators to be computed using only bitwise operations and popcount.
+ Typically, a dot product is computed by element wise multiplication of two vectors followed by
+ summing all the elements, like the simple example below. If all the data is binary, the input
+ vectors can be packed into single integer, and the dot product can be computed by  bitwise-anding
+ the packed inputs and counting the number of 1’s in the result using popcount.
+Note: Depending how the input data is quantized, bitwise-xnor may be used instead of bitwise-and.</p>
+
+<p style="text-align: center"><img src="/images/low-precision/binary-dotproduct.png" alt="image" width="50%" /></p>
+<center> Binary dot product.</center>
+<p></p>
+
+<p>Arbitrary precision dot products can be computed in this fashion by first separating input data
+into bitplanes. Once in this representation we can compute dotproduct by summing weighted binary
+dot products between the bitplanes of A and B. The number of binary dotproducts grows with the
+product of A and B’s precision, so this method is only practical for very low precision data.</p>
+
+<p style="text-align: center"><img src="/images/low-precision/bitserial-dotproduct.png" alt="image" width="50%" /></p>
+<center> Bitserial dot product.</center>
+<p></p>
+
+<h2 id="defining-operators-in-tvm">Defining Operators in TVM</h2>
+<p>Before the computation, input data needs to be bitpacked so that the bitplanes of the input data
+can be accessed and are packed into a supported datatype such as a uint8 or uint32. We provide
+a flexible bitpacking operator that takes arbitrary size input tensors and returns a bitpacked
+tensor where the user specifies which axis the bitplanes should be.</p>
+
+<p style="text-align: center"><img src="/images/low-precision/bitpack.png" alt="image" width="50%" /></p>
+<center> Different bitpacked layouts.</center>
+<p></p>
+
+<p>Once in this bitpacked format the low precision  convolution can be computed bitserially.
+For this demo, that data is packed along the input channel and the bitplanes are added to the
+innermost axis, and the data is packed into 32-bit integers. The bitserial convolution is computed
+similar to a normal convolution, but the bitwise-and (&amp;) replaces multiplication, and we use
+popcount to accumulate values in the packed data. The bitplane axes become additional reduction axes
+and compute the binary dot products between different bitplanes of the input and kernel.
+Finally, the output is computed in an unpacked format and in higher precision.</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Input_bitpacked</span> <span class="o">=</span> <span class="n">bitpack</span><span class="p">(</span><span class="n">Input</span><span class="p">,</span> <span class="n">activation_bits</span><span class="p">,</span> <span class="n">pack_axis</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">bit_axis</span><span class="o">=</spa [...]
+<span class="n">Weights_bitpacked</span> <span class="o">=</span> <span class="n">bitpack</span><span class="p">(</span><span class="n">Filter</span><span class="p">,</span> <span class="n">weight_bits</span><span class="p">,</span> <span class="n">pack_axis</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">bit_axis</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">pack_type</span><span class="o"> [...]
+<span class="n">batch</span><span class="p">,</span> <span class="n">in_height</span><span class="p">,</span> <span class="n">in_width</span><span class="p">,</span> <span class="n">in_channel_q</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Input_bitpacked</span><span class="o">.</span><span class="n">shape</span>
+<span class="n">kernel_h</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">num_filter</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Filter_bitpakced</span><span class="o">.</span><span class="n">shape</span>
+
+<span class="n">stride_h</span><span class="p">,</span> <span class="n">stride_w</span> <span class="o">=</span> <span class="n">stride</span>
+<span class="n">pad_top</span><span class="p">,</span> <span class="n">pad_left</span><span class="p">,</span> <span class="n">pad_down</span><span class="p">,</span> <span class="n">pad_right</span> <span class="o">=</span> <span class="n">get_pad_tuple</span><span class="p">(</span><span class="n">padding</span><span class="p">,</span> <span class="p">(</span><span class="n">kernel_h</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">))</span>
+
+<span class="c1"># Computing the output shape
+</span><span class="n">out_channel</span> <span class="o">=</span> <span class="n">num_filter</span>
+<span class="n">out_height</span> <span class="o">=</span> <span class="n">simplify</span><span class="p">((</span><span class="n">in_height</span> <span class="o">-</span> <span class="n">kernel_h</span> <span class="o">+</span> <span class="n">pad_top</span> <span class="o">+</span> <span class="n">pad_down</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride_h</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
+<span class="n">out_width</span> <span class="o">=</span> <span class="n">simplify</span><span class="p">((</span><span class="n">in_width</span> <span class="o">-</span> <span class="n">kernel_w</span> <span class="o">+</span> <span class="n">pad_left</span> <span class="o">+</span> <span class="n">pad_right</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride_w</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
+<span class="n">pad_before</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">pad_left</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
+<span class="n">pad_after</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">pad_down</span><span class="p">,</span> <span class="n">pad_right</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
+<span class="n">Input_padded</span> <span class="o">=</span> <span class="n">pad</span><span class="p">(</span><span class="n">Input_bitpacked</span><span class="p">,</span> <span class="n">pad_before</span><span class="p">,</span> <span class="n">pad_after</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"PaddedInput"</span><span class="p">)</span>
+
+<span class="c1"># Treat the bitplane axes like additional reduction axes
+</span><span class="n">rc</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">in_channel_q</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'rc'</span><span class="p">)</span>
+<span class="n">ry</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">kernel_h</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ry'</span><span class="p">)</span>
+<span class="n">rx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'rx'</span><span class="p">)</span>
+<span class="n">ib</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">input_bits</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ib'</span><span class="p">)</span>
+<span class="n">wb</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">weight_bits</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'wb'</span><span class="p">)</span>
+
+
+<span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">((</span><span class="n">batch</span><span class="p">,</span> <span class="n">out_height</span><span class="p">,</span> <span class="n">out_width</span><span class="p">,</span> <span class="n">out_channel</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">nn</span><span class="p">,</span> <span class="n">yy</span><span class="p">,</span> <span class="n">xx</span><spa [...]
+             <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">tvm</span><span class="o">.</span><span class="n">popcount</span><span class="p">(</span>
+               <span class="n">Input_padded</span><span class="p">[</span><span class="n">nn</span><span class="p">,</span> <span class="n">yy</span> <span class="o">*</span> <span class="n">stride_h</span> <span class="o">+</span> <span class="n">ry</span><span class="p">,</span> <span class="n">xx</span> <span class="o">*</span> <span class="n">stride_w</span> <span class="o">+</span> <span class="n">rx</span><span class="p">,</span> <span class="n">rc</span><span class="p">,</span> <s [...]
+               <span class="n">Weights_bitpacked</span><span class="p">[</span><span class="n">ry</span><span class="p">,</span> <span class="n">rx</span><span class="p">,</span> <span class="n">rc</span><span class="p">,</span> <span class="n">ff</span><span class="p">,</span> <span class="n">wb</span><span class="p">]))</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">ib</span><span class="o">+</span><span class="n">wb</span><span class="p">)))</span><span [...]
+               <span class="n">axis</span><span class="o">=</span><span class="p">[</span><span class="n">rc</span><span class="p">,</span> <span class="n">ry</span><span class="p">,</span> <span class="n">rx</span><span class="p">,</span> <span class="n">wb</span><span class="p">,</span> <span class="n">ib</span><span class="p">]))</span>
+
+</code></pre></div></div>
+
+<p>In our schedule we apply common optimizations like vectorization and memory tiling to provide better
+memory locality and take advantage of SIMD units. Some of these optimizations such as tiling,
+require parameters that need to be tuned to for the specific microarchitecture. We expose these
+parameters as knobs to TVM and use AutoTVM to automatically tune all the parameters simultaneously.</p>
+
+<p>Finally, we can craft small microkernels to replace the innermost loop(s) of computation and schedule
+ them using TVM’s tensorize primitive. Since, compilers often produce suboptimal code, people can
+ often write short assembly sequences that are more efficient. These microkernels often take advantage
+ of new intrinsics that are being introduced to help accelerate deep learning workloads and use
+ them clever ways to improve memory accesses or reduce the number instructions required.</p>
+
+<h2 id="results">Results</h2>
+
+<h3 id="raspberry-pi">Raspberry Pi</h3>
+<p>Convolution speedups on Raspberry Pi 3B compared to 16-bit integer TVM implementation.
+Workload are convolution layers from ResNet18.</p>
+
+<p style="text-align: center"><img src="/images/low-precision/rasp-conv.png" alt="image" width="50%" /></p>
+<center> Speedup of low precision convolutions on a Raspberry Pi compared to 16-bit TVM implementation.</center>
+<p></p>
+
+<p>2-bit activation, 1-bit weight convolution speedups on Raspberry Pi 3B compared to hand optimized implementation from <a href="https://arxiv.org/pdf/1712.02427.pdf">High performance ultra-low-precision convolutions
+on mobile devices.</a>.
+Workload are convolution layers from ResNet18.</p>
+
+<p style="text-align: center"><img src="/images/low-precision/rasp-conv-2.png" alt="image" width="50%" /></p>
+<center> Speedup of 2-bit weight 1-bit activation Raspberry Pi convolutions against a hand optimized implementation.</center>
+<p></p>
+
+<h3 id="x86">x86</h3>
+
+<p>Convolution speedups on x86 compared to a 32-bit floating point TVM implementation.
+Note: x86 doesn’t support a vectorized popcount for this microarchitecture, so speedups are lower.</p>
+<p style="text-align: center"><img src="/images/low-precision/x86-conv.png" alt="image" width="50%" /></p>
+<center> Speedup of x86 low precision convolutions compared to a 32-bit floating point TVM implementation.</center>
+<p></p>
+
+<h2 id="show-me-the-code">Show me the code</h2>
+
+<ul>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py">TOPI bitserial convolution</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI ARM cpu bitserial convolution</a></li>
+</ul>
+
+<h2 id="references">References</h2>
+
+<ul>
+  <li>[1] <a href="https://arxiv.org/abs/1810.11066">Automating Generation of Low Precision Deep Learning Operators</a></li>
+  <li>[2] <a href="https://arxiv.org/abs/1603.05279">XNOR-Net</a></li>
+  <li>[3] <a href="https://arxiv.org/abs/1702.00953">HWGQ</a></li>
+  <li>[4] <a href="https://arxiv.org/abs/1606.06160">DoReFa</a></li>
+</ul>
+
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2019/01/19/Golang.html b/2019/01/19/Golang.html
new file mode 100644
index 0000000..bfc4a9c
--- /dev/null
+++ b/2019/01/19/Golang.html
@@ -0,0 +1,326 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>TVM Golang Runtime for Deep Learning Deployment</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>TVM Golang Runtime for Deep Learning Deployment </h1>
+      <p class="post-meta">
+        <time datetime="2019-01-19T00:00:00-08:00" itemprop="datePublished">
+          Jan 19, 2019
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Siva</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <h2 id="introduction">Introduction</h2>
+
+<p>TVM is an open deep learning compiler stack to compile various deep learning models from different
+frameworks to CPU, GPU or specialized accelerators.  TVM supports model compilation from a wide range
+of front ends like Tensorflow, Onnx, Keras, Mxnet, Darknet, CoreML and Caffe2. TVM compiled modules
+can be deployed on backends like LLVM (Javascript or WASM, AMD GPU, ARM or X86), NVidia GPU (CUDA),
+OpenCL and Metal.</p>
+
+<p>TVM supports runtime bindings for programming languages like Javascript, Java, Python, C++… and now Golang.
+With a wide range of frontend, backend and runtime bindings, TVM enables developers to integrate and
+deploy deep learning models from a variety of frameworks to a choice of hardware via many programming languages.</p>
+
+<p>The TVM import and compilation process generates a graph JSON, a module and a params. Any application that
+integrates the TVM runtime can load these compiled modules and perform inference. A detailed tutorial of module
+import and compilation using TVM can be found at <a href="https://docs.tvm.ai/tutorials/">tutorials</a>.</p>
+
+<p>TVM now supports deploying compiled modules through Golang. Golang applications can make use of this
+to deploy the deep learning models through TVM. The scope of this blog is the introduction of <code class="highlighter-rouge">gotvm</code> package,
+the package build process and a sample application using <code class="highlighter-rouge">gotvm</code> to load a compiled module and perform inference.</p>
+
+<h2 id="package">Package</h2>
+
+<p>The golang package <code class="highlighter-rouge">gotvm</code> is built on top of TVM’s C runtime interface. The API in this package
+abstracts the native C types and provides Golang compatible types. The package source can be found
+at <a href="https://github.com/dmlc/tvm/tree/master/golang">gotvm</a>.</p>
+
+<p>This package leverages golang’s interface, slices, function closures and implicitly handles the
+necessary conversions across API calls.</p>
+
+<p style="text-align: center"><img src="/images/golang/TVM-Golang-Blog.png" alt="image" width="60%" /></p>
+<center> Golang Interface over TVM Runtime </center>
+<p></p>
+
+<h2 id="how-to">How to</h2>
+
+<p>As shown in the below diagram <code class="highlighter-rouge">gotvm</code> enables golang applications to integrate deep learning models
+from various frameworks without the hassle of understanding each framework related interface API.
+Developers can make use of TVM to import and compile deep learning models and generate TVM artifacts.
+<code class="highlighter-rouge">gotvm</code> package provides golang friendly API to load, configure, feed input and get output.</p>
+
+<p style="text-align: center"><img src="/images/golang/TVM-Golang-Flow.png" alt="image" width="100%" /></p>
+<center> Import, Compile, Integrate and Deploy</center>
+<p></p>
+
+<p>TVM <a href="https://docs.tvm.ai/tutorials/#compile-deep-learning-models">Compile Deep Learning Models</a> tutorials
+are available to compile models from all frameworks supported by the TVM frontend. This compilation process
+generates the artifacts required to integrate and deploy the model on a target.</p>
+
+<h2 id="api">API</h2>
+
+<p><code class="highlighter-rouge">gotvm</code> package provides a handful of datatypes and API functions to initialize, load and infer
+from a golang application. Like any other golang package we just need to import <code class="highlighter-rouge">gotvm</code> package here.</p>
+
+<ul>
+  <li>Module : The Module API can be used to load a TVM compiled module into TVM runtime and access any functions.</li>
+  <li>Value : The Value API provides helper functions to set arguments or get return values in golang types like basic types or slices.</li>
+  <li>Function : The Function API is useful for getting handles to functions and invoking them.</li>
+  <li>Array : The Array API is useful for setting and getting Tensor data via golang slice.</li>
+  <li>Context : The Context API contains helper functions to build backend context handles.</li>
+</ul>
+
+<h2 id="example">Example</h2>
+
+<p>A simple example with inline documentation of loading a compiled module and performing inference is shown below.
+For simplicity the error handling is ignored here, but is important in real applications.</p>
+
+<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
+<span class="n">package</span> <span class="n">main</span>
+
+<span class="c1">// Import compiled gotvm package.</span>
+<span class="n">import</span> <span class="p">(</span>
+    <span class="s">"./gotvm"</span>
+<span class="p">)</span>
+
+<span class="c1">// Some constants for TVM compiled model paths.</span>
+<span class="c1">// modLib : Is the compiled library exported out of compilation.</span>
+<span class="c1">// modJson : TVM graph JSON.</span>
+<span class="c1">// modParams : Exported params out of TVM compilation process.</span>
+<span class="k">const</span> <span class="p">(</span>
+    <span class="n">modLib</span>    <span class="o">=</span> <span class="s">"./libdeploy.so"</span>
+    <span class="n">modJSON</span>   <span class="o">=</span> <span class="s">"./deploy.json"</span>
+    <span class="n">modParams</span> <span class="o">=</span> <span class="s">"./deploy.params"</span>
+<span class="p">)</span>
+
+<span class="c1">// main</span>
+<span class="n">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
+    <span class="c1">// Some util API to query underlying TVM and DLPack version information.</span>
+    <span class="n">fmt</span><span class="p">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"TVM Version   : v%v</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">TVMVersion</span><span class="p">)</span>
+    <span class="n">fmt</span><span class="p">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"DLPACK Version: v%v</span><span class="se">\n\n</span><span class="s">"</span><span class="p">,</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">DLPackVersion</span><span class="p">)</span>
+
+    <span class="c1">// Import tvm module (so).</span>
+    <span class="n">modp</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">LoadModuleFromFile</span><span class="p">(</span><span class="n">modLib</span><span class="p">)</span>
+
+    <span class="c1">// Load module on tvm runtime - call tvm.graph_runtime.create</span>
+    <span class="c1">// with module and graph JSON.</span>
+    <span class="n">bytes</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">ioutil</span><span class="p">.</span><span class="n">ReadFile</span><span class="p">(</span><span class="n">modJSON</span><span class="p">)</span>
+    <span class="n">jsonStr</span> <span class="o">:=</span> <span class="n">string</span><span class="p">(</span><span class="n">bytes</span><span class="p">)</span>
+    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">GetGlobalFunction</span><span class="p">(</span><span class="s">"tvm.graph_runtime.create"</span><span class="p">)</span>
+    <span class="n">graphrt</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="n">jsonStr</span><span class="p">,</span> <span class="n">modp</span><span class="p">,</span> <span class="p">(</span><span class="n">int64</span><span class="p">)(</span><span class="n">gotvm</span><span class="p">.</span><span class="n">KDLCPU</span><span class=" [...]
+    <span class="n">graphmod</span> <span class="o">:=</span> <span class="n">graphrt</span><span class="p">.</span><span class="n">AsModule</span><span class="p">()</span>
+
+
+    <span class="c1">// Allocate input &amp; output arrays and fill some data for input.</span>
+    <span class="n">tshapeIn</span>  <span class="o">:=</span> <span class="p">[]</span><span class="n">int64</span><span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">3</span><span class="p">}</span>
+    <span class="n">tshapeOut</span> <span class="o">:=</span> <span class="p">[]</span><span class="n">int64</span><span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1001</span><span class="p">}</span>
+    <span class="n">inX</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">Empty</span><span class="p">(</span><span class="n">tshapeIn</span><span class="p">,</span> <span class="s">"float32"</span><span class="p">,</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">CPU</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
+    <span class="n">out</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">Empty</span><span class="p">(</span><span class="n">tshapeOut</span><span class="p">)</span>
+    <span class="n">inSlice</span> <span class="o">:=</span> <span class="n">make</span><span class="p">([]</span><span class="n">float32</span><span class="p">,</span> <span class="p">(</span><span class="mi">244</span> <span class="o">*</span> <span class="mi">244</span> <span class="o">*</span> <span class="mi">3</span><span class="p">))</span>
+    <span class="n">rand</span><span class="p">.</span><span class="n">Seed</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
+    <span class="n">rand</span><span class="p">.</span><span class="n">Shuffle</span><span class="p">(</span><span class="n">len</span><span class="p">(</span><span class="n">inSlice</span><span class="p">),</span> <span class="n">func</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span><span class="n">inSlice</span><span class="p">[</span><span class="n">i</spa [...]
+                                               <span class="n">inSlice</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">rand</span><span class="p">.</span><span class="n">Float32</span><span class="p">(),</span>
+                                               <span class="n">rand</span><span class="p">.</span><span class="n">Float32</span><span class="p">()</span> <span class="p">})</span>
+    <span class="n">inX</span><span class="p">.</span><span class="n">CopyFrom</span><span class="p">(</span><span class="n">inSlice</span><span class="p">)</span>
+
+    <span class="c1">// Load params</span>
+    <span class="n">bytes</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">ioutil</span><span class="p">.</span><span class="n">ReadFile</span><span class="p">(</span><span class="n">modParams</span><span class="p">)</span>
+    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"load_params"</span><span class="p">)</span>
+    <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="n">bytes</span><span class="p">)</span>
+
+
+    <span class="c1">// Set module input</span>
+    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"set_input"</span><span class="p">)</span>
+    <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="s">"input"</span><span class="p">,</span> <span class="n">inX</span><span class="p">)</span>
+
+    <span class="c1">// Run or Execute the graph</span>
+    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"run"</span><span class="p">)</span>
+    <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">()</span>
+
+    <span class="c1">// Get output from runtime.</span>
+    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"get_output"</span><span class="p">)</span>
+    <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="n">int64</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">out</span><span class="p">)</span>
+
+    <span class="c1">// Access output tensor data.</span>
+    <span class="n">outIntf</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">out</span><span class="p">.</span><span class="n">AsSlice</span><span class="p">()</span>
+    <span class="n">outSlice</span> <span class="o">:=</span> <span class="n">outIntf</span><span class="p">.([]</span><span class="n">float32</span><span class="p">)</span>
+
+    <span class="c1">// outSlice here holds flattened output data as a golang slice.</span>
+<span class="p">}</span>
+</code></pre></div></div>
+
+<p><code class="highlighter-rouge">gotvm</code> extends the TVM packed function system to support golang function closures as packed functions.
+<a href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a> available to register golang
+closure as TVM packed function and invoke the same across programming language barriers.</p>
+
+<h2 id="show-me-the-code">Show me the code</h2>
+
+<ul>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/golang/src">Package Source</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a></li>
+</ul>
+
+<h2 id="references">References</h2>
+
+<ul>
+  <li>[1] <a href="https://golang.org">Go Programming Lang</a></li>
+  <li>[2] <a href="https://blog.golang.org/godoc-documenting-go-code">Go Documentation Guide Lines</a></li>
+  <li>[3] <a href="https://golang.org/pkg/testing">Go Testcase Framework</a></li>
+  <li>[4] <a href="https://golang.org/cmd/cgo">Go CFFI</a></li>
+  <li>[5] <a href="https://blog.learngoprogramming.com/golang-variadic-funcs-how-to-patterns-369408f19085">Go Variadic Functions</a></li>
+  <li>[6] <a href="https://github.com/jdeng/gomxnet">CFFI Ref</a></li>
+  <li>[7] <a href="https://golang.org/pkg/runtime/#SetFinalizer">Go Finalizers</a></li>
+</ul>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2019/03/18/tvm-apache-announcement.html b/2019/03/18/tvm-apache-announcement.html
new file mode 100644
index 0000000..fc8c8b0
--- /dev/null
+++ b/2019/03/18/tvm-apache-announcement.html
@@ -0,0 +1,179 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>TVM Deep Learning Compiler Joins Apache Software Foundation</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>TVM Deep Learning Compiler Joins Apache Software Foundation </h1>
+      <p class="post-meta">
+        <time datetime="2019-03-18T00:00:00-07:00" itemprop="datePublished">
+          Mar 18, 2019
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">TVM Community</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms – such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) – requires significant manual effort.</p>
+
+<p>TVM is an open source deep learning compiler stack that closes the gap between the productivity-focused deep learning frameworks, and the performance- or efficiency-oriented hardware backends. Today, we are glad to announce that the TVM community has decided to move on to Apache incubator, and becomes an Apache(incubating) project.</p>
+
+<p style="text-align: center"><img src="/images/main/tvm-stack.png" alt="image" width="70%" /></p>
+
+<p>TVM stack began as a research project at the <a href="https://sampl.cs.washington.edu/">SAMPL group</a> of Paul G. Allen School of Computer Science &amp; Engineering, University of Washington. The project uses the loop-level IR and several optimizations from the <a href="http://halide-lang.org/">Halide project</a>, in addition to <a href="https://tvm.ai/about">a full deep learning compiler stack</a> to support machine learning workloads for diverse hardware backends.</p>
+
+<p>Since its introduction, the project was driven by an open source community involving multiple industry and academic institutions. Currently, the TVM stack includes a high-level differentiable programming IR for high-level optimization, a machine learning driven program optimizer and VTA – a fully open sourced deep learning accelerator. The community brings innovations from machine learning, compiler systems, programming languages, and computer architecture to build a full-stack open s [...]
+
+<p>Besides the technical innovations, the community adopts an open, welcoming and neutral policy. The project is run by committers who are elected purely based on their merit of the contributions to the project. Besides the contributors from UW SAMPL, the community now has nearly 200 contributors that come from Amazon Web Services (AWS), Qualcomm, Facebook, Google, Huawei, AMD, Microsoft, Cornell University, University of California, Berkeley, and more.        The community successfully  [...]
+
+<p>We would like to take this chance to thank the Allen School for supporting the SAMPL team that gave birth to the TVM project. We would also like to thank the Halide project which provided the basis for TVM’s loop-level IR and initial code generation. We would like to thank our Apache incubator mentors for introducing the project to Apache and providing useful guidance. Finally, we would like to thank the TVM community and all of the organizations, as listed above, that supported the d [...]
+
+<p>See also the <a href="https://news.cs.washington.edu/2019/03/18/allen-schools-tvm-deep-learning-compiler-framework-transitions-to-apache/">Allen School news about the transition here</a>, <a href="https://sampl.cs.washington.edu/tvmconf/#about-tvmconf">TVM conference program slides and recordings</a>, and <a href="https://docs.tvm.ai/contribute/community.html">our community guideline here</a>. Follow us on Twitter: <a href="https://twitter.com/ApacheTVM">@ApacheTVM</a>.</p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2019/04/29/opt-cuda-quantized.html b/2019/04/29/opt-cuda-quantized.html
new file mode 100644
index 0000000..aa12995
--- /dev/null
+++ b/2019/04/29/opt-cuda-quantized.html
@@ -0,0 +1,300 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Automating Optimization of Quantized Deep Learning Models on CUDA</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Automating Optimization of Quantized Deep Learning Models on CUDA </h1>
+      <p class="post-meta">
+        <time datetime="2019-04-29T09:00:00-07:00" itemprop="datePublished">
+          Apr 29, 2019
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Wuwei Lin</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>Deep learning has been successfully applied to a variety of tasks.
+On real-time scenarios such as inference on autonomous vehicles, the inference speed of the model is critical.
+Network quantization is an effective approach to accelerating deep learning models.
+In quantized models, both data and model parameters are represented with low precision data types such as <code class="highlighter-rouge">int8</code> and <code class="highlighter-rouge">float16</code>.
+The lowered data bandwidth reduces the inference time and memory/storage requirements, as well as the power consumption.
+Meanwhile, under proper quantization schemes, we can minimize the accuracy drops of the quantized models.
+Therefore, quantized models are of particular interests of researchers and developers as it makes large models suitable to deploy on diverse devices, such as GPU, CPU and mobile devices.</p>
+
+<p>Previously, quantized operators are usually optimized with handcrafted microkernels for different workloads, or rely on blackbox proprietary solutions such as cuDNN and TensorRT.
+Writing a high-performance microkernel in assembly can be very challenging and usually requires heavy engineering effort.
+Besides, it is difficult to adapt these ad-hoc microkernels to emerging workloads and new devices.</p>
+
+<p style="text-align: center"><img src="/images/cuda-quantized/benchmark.svg" alt="image" width="100%" /></p>
+<center> Figure 1. Inference time of different models on TVM, TensorRT, and MXNet </center>
+<p></p>
+
+<p>TVM solves this challenge with a full stack compiler and a machine-learning-based optimizer to automatically generate computing kernels.
+TVM can generate efficient kernels via automatic search in a human-designed search space.
+In standard workloads such as VGG and ResNet, TVM achieves competitive performance compared with other state-of-the-art frameworks. 
+In emerging models such as ResNeXt and Deformable ConvNets, the automatic optimization makes it easy for TVM to adapt to these new workloads and achieve a significant performance boost.</p>
+
+<p>In this post, we show how to use TVM to automatically optimize of quantized deep learning models on CUDA.</p>
+
+<h1 id="expressing-quantized-cuda-kernels-in-tvm">Expressing Quantized CUDA Kernels in TVM</h1>
+<h2 id="leveraging-tensor-intrinsics-via-tensorization">Leveraging Tensor Intrinsics via Tensorization</h2>
+<p>Many platforms provide architecture-specific instructions for special computation patterns, for example, the SIMD instructions on x86, and the <code class="highlighter-rouge">dp4a</code> and <code class="highlighter-rouge">hfma</code> instructions on CUDA.
+These intrinsic instructions are highly optimized for specific devices.
+By leveraging hardware intrinsics, we can achieve a significant performance boost for quantized operators.</p>
+
+<p>Currently, <a href="https://devblogs.nvidia.com/mixed-precision-programming-cuda-8/">dp4a</a> has been extensively used in TVM int8 operators on CUDA.
+<code class="highlighter-rouge">dp4a</code> is a CUDA intrinsic on Compute Capability 6.1 devices.
+It is a mixed-precision instruction that provides the efficient computation of the dot product between two 4-element 8-bit integer vectors and accumulates the result in 32-bit format.
+Using <code class="highlighter-rouge">dp4a</code>, we can implement a dot product between 8-bit integer vectors with number of elements evenly divisible by four.
+With an efficient dot product operator, we can implement high-level operators such as 2d convolution and dense layers as these operators are commonly backed by dot products.</p>
+
+<p>To illustrate, in 2d convolution we accumulate along the channel, the width, and the height axis of the kernel.
+This is a typical use case of <code class="highlighter-rouge">dp4a</code>.
+TVM uses tensorization to support calling external intrinsics.
+We do not need to modify the original computation declaration; we use the schedule primitive <code class="highlighter-rouge">tensorize</code> to replace the accumulation with <code class="highlighter-rouge">dp4a</code> tensor intrinsic.
+More details of tensorization can be found in the <a href="https://docs.tvm.ai/tutorials/language/tensorize.html">tutorial</a>.</p>
+
+<h2 id="data-layout-rearrangement">Data Layout Rearrangement</h2>
+<p>One of the challenges in tensorization is that we may need to design special computation logic to adapt to the requirement of tensor intrinsics.
+Although it is natural to accumulate along the inner axis of the tensor in the dense operator, <code class="highlighter-rouge">conv2d</code> can be more challenging.
+In <code class="highlighter-rouge">conv2d</code> we expect to take a slice in the channel dimension as the input of <code class="highlighter-rouge">dp4a</code> because the number of channels is typically multiple of 4 (otherwise we fall back to original <code class="highlighter-rouge">conv2d</code> in NCHW layout).
+Meanwhile, to achieve memory locality, we would like to reduce along the innermost axis first.
+Taking these factors into account, we use a custom data layout to address this challenge.</p>
+
+<p>In CUDA int8 2d convolution, we empirically choose <code class="highlighter-rouge">NCHW4c</code> as data layout and <code class="highlighter-rouge">OIHW4o4i</code> as weight layout.
+The templates can also be easily generalized to <code class="highlighter-rouge">NCHW[x]c</code> and <code class="highlighter-rouge">OIHW[x]o[x]i</code>, where x is an arbitrary positive integer divisible by four.
+In the data layout we choose, slices of channels are in the packed innermost dimension.
+Likewise, we pack slices in both the input and output channel dimensions of the weight so that the output has a consistent data layout with the input, which prevents redundant layout transformations between layers.</p>
+
+<p>We show the computation of one element of the output of the 2d convolution in Figure 2.
+The element in each position of the super dimension (the outer dimension of the blocked layout which contains packed elements) NCHW and OIHW is the packed input and kernel, respectively.
+Each column of the packed kernel comes from a different filter.
+We calculate the dot product between the packed input and each row in the packed kernel using <code class="highlighter-rouge">dp4a</code>, and accumulate the result to the output tensor.</p>
+
+<p style="text-align: center"><img src="/images/cuda-quantized/conv2d.png" alt="image" width="60%" /></p>
+<div>
+Figure 2. 2D convolution with data layout in NCHW4c and weight layout in OIHW4o4i.
+<b>Left</b>: The input tensor in NCHW4c layout. One moving filter of the kernel is colored in blue. One element of the input and kernel is colored in grey. 
+<b>Mid</b>: The packed input and kernel in the grey block.
+<b>Right</b>: The output in NCHW4c layout. Inside the one element depicted, there are four packed elements in channel sub-dimension.
+</div>
+<p></p>
+
+<p>After we have specified the layout of convolution layers, other operators such as <code class="highlighter-rouge">add</code> and activations can automatically adapt to the chosen layout during the <a href="https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a> pass in Relay.
+The layout transformation of the weight can be precomputed offline. Therefore, we can run the whole model in the same layout without extra overhead.</p>
+
+<h2 id="designing-search-space-for-automatic-optimization">Designing Search Space for Automatic Optimization</h2>
+<p>The key to achieving good performance in our quantized operators is to integrate with machine-learning-based automatic optimization. One question is how to design an effective schedule search space.
+An effective schedule template means that we can obtain good performance in a reasonable number of iterations in automatic tuning.
+Generally speaking, we strive to define a flexible template to cover different configurations in the search space.
+On the other hand, we also take advantage of the prior knowledge in performance optimization.
+For example, as caching data in the shared memory is a common practice in CUDA programming, we utilize shared memory, but we use machine learning to choose the best tile size.
+We also do some manual tiling such as splitting axes by 4 or 16 to facilitate vectorized memory access.</p>
+
+<p>In quantized 2d convolution, we design a search space that includes a set of tunable options, such as the tile size, the axes to fuse, configurations of loop unrolling and double buffering.
+The templates of quantized <code class="highlighter-rouge">conv2d</code> and <code class="highlighter-rouge">dense</code> on CUDA are registered under template key <code class="highlighter-rouge">int8</code>.
+During automatic tuning, we can create tuning tasks for these quantized operators by setting the <code class="highlighter-rouge">template_key</code> argument.
+Details of how to launch automatic optimization can be found in the <a href="https://docs.tvm.ai/tutorials/autotvm/tune_relay_cuda.html">AutoTVM tutorial</a>.</p>
+
+<h1 id="general-workflow">General Workflow</h1>
+
+<p style="text-align: center"><img src="/images/cuda-quantized/workflow.png" alt="image" width="60%" /></p>
+<center> Figure 3. Workflow of running quantized models </center>
+<p></p>
+
+<p>TVM provides an easy workflow to quantize trained models from other frameworks, automatically optimize operators (with AutoTVM), and deploy to different devices.</p>
+
+<p>First, we use the Relay frontend to import existing models. Here we use an MXNet model with <code class="highlighter-rouge">(1, 3, 224, 224)</code> input shape as an example.</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sym</span><span class="p">,</span> <span class="n">arg_params</span><span class="p">,</span> <span class="n">aux_params</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">load_checkpoint</span><span class="p">(</span><span class="n">model_path</span><span class="p">,</span> < [...]
+<span class="n">net</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">from_mxnet</span><span class="p">(</span><span class="n">sym</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">{</span><span class="s">'data'</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</sp [...]
+</code></pre></div></div>
+
+<p>Next, we use the relay quantization API to convert it to a quantized model.</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">net</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">quantize</span><span class="o">.</span><span class="n">quantize</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="n">params</span><span class="p">)</span>
+</code></pre></div></div>
+
+<p>Then, we use AutoTVM to extract tuning tasks for the operators in the model and perform automatic optimization. The <a href="https://docs.tvm.ai/tutorials/autotvm/tune_relay_cuda.html">AutoTVM tutorial</a> provides an example for this.</p>
+
+<p>Finally, we build the model and run inference in the quantized mode.</p>
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">relay</span><span class="o">.</span><span class="n">build_config</span><span class="p">(</span><span class="n">opt_level</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
+    <span class="n">graph</span><span class="p">,</span> <span class="n">lib</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+</code></pre></div></div>
+<p>The result of <code class="highlighter-rouge">relay.build</code> is a deployable library.
+We can either run inference <a href="https://docs.tvm.ai/tutorials/frontend/from_mxnet.html#execute-the-portable-graph-on-tvm">on the GPU</a> directly or deploy <a href="https://docs.tvm.ai/tutorials/frontend/deploy_model_on_rasp.html#deploy-the-model-remotely-by-rpc">on the remote devices</a> via RPC.</p>
+
+<h1 id="benchmark">Benchmark</h1>
+<p>To verify the performance of the quantized operators in TVM, we benchmark the performance of several popular network models including VGG-19, ResNet-50 and Inception V3.
+We also benchmark on DRN-C-26, ResNeXt-50, and DCN-ResNet-101 from <a href="https://github.com/msracver/Deformable-ConvNets">Deformable ConvNets</a> to show the performance of emerging models, which contains less conventional operators such as dilated convolutions, group convolutions and deformable convolutions.
+We choose NVIDIA TensorRT as our baseline.
+The result of MXNet 1.4 + cuDNN 7.3 in float32 mode is reported to show the speed up of quantization.
+The experiments are conducted on NVIDIA GTX 1080.
+We report the inference time per image when running in batch size = 1 and 16.</p>
+
+<p>As shown in the Figure 1, TVM achieves up to 8x speedup using quantization.
+In standard CNN models such as VGG and ResNet, TVM achieves parity with the state-of-the-art results from TensorRT.</p>
+
+<p>When benchmarking emerging models, TVM achieves impressive results.
+We obtain significant performance gains on ResNeXt and DCN-ResNet-101.
+Results of DCN-ResNet-101 of TensorRT are not available because there is no official implementation of the deformable convolution.
+We show that automatic optimization in TVM makes it easy and flexible to support and optimize emerging workloads.</p>
+
+<h1 id="show-me-the-code">Show Me the Code</h1>
+<ul>
+  <li><a href="https://github.com/vinx13/tvm-cuda-int8-benchmark">Benchmark</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py">CUDA int8 conv2d</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA int8 group conv2d</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py">CUDA int8 dense</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py">Tensor intrinsics declaration</a></li>
+</ul>
+
+<h1 id="bio--acknowledgement">Bio &amp; Acknowledgement</h1>
+<p><a href="https://wuwei.io/">Wuwei Lin</a> is an undergraduate student at SJTU. He is currently an intern at TuSimple. The author has many thanks to <a href="https://homes.cs.washington.edu/~tqchen/">Tianqi Chen</a> and <a href="https://homes.cs.washington.edu/~eqy/">Eddie Yan</a> for their reviews.</p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/2019/05/30/pytorch-frontend.html b/2019/05/30/pytorch-frontend.html
new file mode 100644
index 0000000..767eeb3
--- /dev/null
+++ b/2019/05/30/pytorch-frontend.html
@@ -0,0 +1,258 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Integrating TVM into PyTorch</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="https://tvm.ai/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="https://tvm.ai/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="https://tvm.ai/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://sampl.cs.washington.edu/tvmconf">TVM Conference</a></li>
+            <li> <a href="https://docs.tvm.ai/tutorials/">Tutorials</a></li>
+            <li> <a href="https://docs.tvm.ai">Docs</a></li>
+            <li> <a href="https://github.com/dmlc/tvm/">Github</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Integrating TVM into PyTorch </h1>
+      <p class="post-meta">
+        <time datetime="2019-05-30T00:00:00-07:00" itemprop="datePublished">
+          May 30, 2019
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Bram Wasti</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>As TVM continuously demonstrates improvements to the efficiency of deep learning execution,
+it has become clear that PyTorch stands to benefit from directly leveraging the compiler stack.
+A major tenet of PyTorch is providing seamless and robust integrations that don’t get in the user’s way.
+To that end, PyTorch now has an official TVM-based backend, <a href="https://github.com/pytorch/tvm">torch_tvm</a>.</p>
+
+<p>Usage is simple:</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import torch_tvm
+torch_tvm.enable()
+</code></pre></div></div>
+
+<p>That’s it!  PyTorch will then attempt to convert all operators it can to known Relay operators during its JIT compilation process.</p>
+
+<h3 id="background">Background</h3>
+
+<p>Unlike many other ML frameworks, PyTorch exposes an eager-execution programming interface.  This style of programming avoids graph-based meta-programming and focuses on the direct manipulation of n-dimensional arrays (tensors) in a Pythonic way.  As such, the framework was initially well suited for the experimentation and development of models, but not for automatic performance optimization or deployment.  To leverage optimizing compiler techniques, some large changes were recently in [...]
+
+<p><img src="https://i.imgur.com/4XVHbJE.png" alt="TVM Integration" /></p>
+
+<p>PyTorch 1.0 introduced PyTorch IR, a PyTorch-specific intermediate representation for models similar to Relay.  PyTorch programs can be converted into the IR via model tracing, which records the execution of a model or TorchScript, a subset of Python.  The new TVM backend lowers PyTorch IR to Relay, and is able to transparently improve PyTorch performance with little user involvement.</p>
+
+<h3 id="integration-and-results">Integration and Results</h3>
+
+<p>To support Relay, two features were added to the PyTorch JIT: custom transformation passes and custom subgraph interpreters.</p>
+
+<p>When <code class="highlighter-rouge">torch_tvm</code> is enabled, subgraphs of PyTorch IR that can be converted to Relay <code class="highlighter-rouge">Expr</code>s will be marked as Relay-compatible.  Since PyTorch IR does not always contain shape information, none of the subgraphs can be compiled in a useful way before invocation.</p>
+
+<p>During user invocation, the PyTorch JIT runtime will determine input shape information and compile the previously marked subgraphs with the new Relay C++ <a href="https://github.com/pytorch/tvm/blob/master/torch_tvm/compiler.cpp#L226-L246">build system</a>.  The compilation is cached based on input shapes for subsequent runs.  More details can be found in the <a href="https://github.com/pytorch/tvm/blob/master/README.md">README</a>.</p>
+
+<p><code class="highlighter-rouge">torch_tvm</code> has a continuous benchmark system set up, which is monitoring the performance of ResNet18 on CPU.
+Out of the box TVM provides over two times the performance of the default PyTorch JIT backend for various ResNet models.
+Below is a graph that details the iterations per second achieved with 16 threads on an AWS c5n.4xlarge instance (larger is better):</p>
+
+<p style="text-align: center"><img src="https://i.imgur.com/KfJ7oas.png" alt="bench" width="90%" /></p>
+
+<p>These results are quite encouraging, and the project will continue to focus on improving CPU inference speed across more models.</p>
+
+<h3 id="future-work">Future work</h3>
+
+<p>Right now the PyTorch JIT does a lot of work to find pure functional subsets of its IR to feed to Relay.  This avoids the need to map aliasing and control flow information to Relay, but is not necessary.  Mapping more of the PyTorch IR to Relay may yield performance wins and is a goal of the project.  PyTorch IR is rapidly changing as it is being developed, so this must be done carefully.</p>
+
+<p>More work will be done to ensure the hand off between PyTorch and TVM code is efficient.  This includes unifying the threading model, allocators and reducing the overhead associated with copying inputs into TVM.</p>
+
+<h3 id="tutorial">Tutorial</h3>
+
+<p>If you have an already written PyTorch model, the easiest way to get started comes from using <code class="highlighter-rouge">torch.jit.trace</code> as follows</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import torch_tvm
+from your_model import model, inputs
+
+torch_tvm.enable(opt_level=3)
+
+iters = 100
+warmup = 10
+
+# Ensure your model is in eval mode and also turn off gradients.
+with torch.no_grad():
+  # Use tuned parameters for better performance.
+  with autotvm.apply_history_best("test/autotvm_tuning.log"):
+    # This is where all the compilation happens.
+    trace_tvm = torch.jit.trace(model, inputs)
+    
+    # Warmup
+    for _ in range(warmup):
+      _ = trace_tvm(*inputs)
+
+    # Benchmark
+    start = time.time()
+    for _ in range(iters):
+      _ = trace_tvm(*inputs)
+    tvm_time = time.time() - start
+    
+    print("Took {}s to run {} iters".format(tvm_time, iters))
+</code></pre></div></div>
+
+<p>Much of this code comes from <a href="https://github.com/pytorch/tvm/blob/master/test/benchmarks.py">benchmarks.py</a>.  Note that tuned parameters for AVX2 LLVM compilation is in the <code class="highlighter-rouge">test/</code> folder of the repo.</p>
+
+<p>If you are more comfortable using Relay directly, it is possible to simply extract the expression directly from a
+PyTorch function either via (implicit) tracing or TorchScript:</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def add(a, b, c):
+    return a + b + c
+
+# via tracing
+relay_graph = torch_tvm.to_relay(add, inputs)
+
+@torch.jit.script
+def mul(a, b, c):
+    return a * b * c
+
+# via script
+relay_graph = torch_tvm.to_relay(mul, inputs)
+</code></pre></div></div>
+
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+  </body>
+</html>
+
diff --git a/404.html b/404.html
deleted file mode 100644
index 6904bcd..0000000
--- a/404.html
+++ /dev/null
@@ -1 +0,0 @@
-Sorry this page does not exist =(
diff --git a/CNAME b/CNAME
new file mode 100644
index 0000000..4247f3d
--- /dev/null
+++ b/CNAME
@@ -0,0 +1 @@
+tvm.ai
diff --git a/README.md b/README.md
index 7747deb..226efab 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,12 @@
-# Course Homepage
+# TVM Project Homepage
+
+## Serve Locally
+
+```bash
+./serve_local.sh
+```
+
+## Deployment
+
+We use the script [scripts/deploy_to_asf_site.sh](scripts/deploy_to_asf_site.sh)
+to generate and deploy content to the asf-site branch.
diff --git a/Rakefile b/Rakefile
deleted file mode 100644
index 183ca1e..0000000
--- a/Rakefile
+++ /dev/null
@@ -1,306 +0,0 @@
-require "rubygems"
-require 'rake'
-require 'yaml'
-require 'time'
-
-SOURCE = "."
-CONFIG = {
-  'version' => "0.3.0",
-  'themes' => File.join(SOURCE, "_includes", "themes"),
-  'layouts' => File.join(SOURCE, "_layouts"),
-  'posts' => File.join(SOURCE, "_posts"),
-  'post_ext' => "md",
-  'theme_package_version' => "0.1.0"
-}
-
-# Path configuration helper
-module JB
-  class Path
-    SOURCE = "."
-    Paths = {
-      :layouts => "_layouts",
-      :themes => "_includes/themes",
-      :theme_assets => "assets/themes",
-      :theme_packages => "_theme_packages",
-      :posts => "_posts"
-    }
-    
-    def self.base
-      SOURCE
-    end
-
-    # build a path relative to configured path settings.
-    def self.build(path, opts = {})
-      opts[:root] ||= SOURCE
-      path = "#{opts[:root]}/#{Paths[path.to_sym]}/#{opts[:node]}".split("/")
-      path.compact!
-      File.__send__ :join, path
-    end
-  
-  end #Path
-end #JB
-
-# Usage: rake post title="A Title" [date="2012-02-09"] [tags=[tag1,tag2]] [category="category"]
-desc "Begin a new post in #{CONFIG['posts']}"
-task :post do
-  abort("rake aborted: '#{CONFIG['posts']}' directory not found.") unless FileTest.directory?(CONFIG['posts'])
-  title = ENV["title"] || "new-post"
-  tags = ENV["tags"] || "[]"
-  category = ENV["category"] || ""
-  category = "\"#{category.gsub(/-/,' ')}\"" if !category.empty?
-  slug = title.downcase.strip.gsub(' ', '-').gsub(/[^\w-]/, '')
-  begin
-    date = (ENV['date'] ? Time.parse(ENV['date']) : Time.now).strftime('%Y-%m-%d')
-  rescue => e
-    puts "Error - date format must be YYYY-MM-DD, please check you typed it correctly!"
-    exit -1
-  end
-  filename = File.join(CONFIG['posts'], "#{date}-#{slug}.#{CONFIG['post_ext']}")
-  if File.exist?(filename)
-    abort("rake aborted!") if ask("#{filename} already exists. Do you want to overwrite?", ['y', 'n']) == 'n'
-  end
-  
-  puts "Creating new post: #{filename}"
-  open(filename, 'w') do |post|
-    post.puts "---"
-    post.puts "layout: post"
-    post.puts "title: \"#{title.gsub(/-/,' ')}\""
-    post.puts 'description: ""'
-    post.puts "category: #{category}"
-    post.puts "tags: #{tags}"
-    post.puts "---"
-    post.puts "{% include JB/setup %}"
-  end
-end # task :post
-
-# Usage: rake page name="about.html"
-# You can also specify a sub-directory path.
-# If you don't specify a file extention we create an index.html at the path specified
-desc "Create a new page."
-task :page do
-  name = ENV["name"] || "new-page.md"
-  filename = File.join(SOURCE, "#{name}")
-  filename = File.join(filename, "index.html") if File.extname(filename) == ""
-  title = File.basename(filename, File.extname(filename)).gsub(/[\W\_]/, " ").gsub(/\b\w/){$&.upcase}
-  if File.exist?(filename)
-    abort("rake aborted!") if ask("#{filename} already exists. Do you want to overwrite?", ['y', 'n']) == 'n'
-  end
-  
-  mkdir_p File.dirname(filename)
-  puts "Creating new page: #{filename}"
-  open(filename, 'w') do |post|
-    post.puts "---"
-    post.puts "layout: page"
-    post.puts "title: \"#{title}\""
-    post.puts 'description: ""'
-    post.puts "---"
-    post.puts "{% include JB/setup %}"
-  end
-end # task :page
-
-desc "Launch preview environment"
-task :preview do
-  system "jekyll serve -w"
-end # task :preview
-
-# Public: Alias - Maintains backwards compatability for theme switching.
-task :switch_theme => "theme:switch"
-
-namespace :theme do
-  
-  # Public: Switch from one theme to another for your blog.
-  #
-  # name - String, Required. name of the theme you want to switch to.
-  #        The theme must be installed into your JB framework.
-  #
-  # Examples
-  #
-  #   rake theme:switch name="the-program"
-  #
-  # Returns Success/failure messages.
-  desc "Switch between Jekyll-bootstrap themes."
-  task :switch do
-    theme_name = ENV["name"].to_s
-    theme_path = File.join(CONFIG['themes'], theme_name)
-    settings_file = File.join(theme_path, "settings.yml")
-    non_layout_files = ["settings.yml"]
-
-    abort("rake aborted: name cannot be blank") if theme_name.empty?
-    abort("rake aborted: '#{theme_path}' directory not found.") unless FileTest.directory?(theme_path)
-    abort("rake aborted: '#{CONFIG['layouts']}' directory not found.") unless FileTest.directory?(CONFIG['layouts'])
-
-    Dir.glob("#{theme_path}/*") do |filename|
-      next if non_layout_files.include?(File.basename(filename).downcase)
-      puts "Generating '#{theme_name}' layout: #{File.basename(filename)}"
-
-      open(File.join(CONFIG['layouts'], File.basename(filename)), 'w') do |page|
-        page.puts "---"
-        page.puts File.read(settings_file) if File.exist?(settings_file)
-        page.puts "layout: default" unless File.basename(filename, ".html").downcase == "default"
-        page.puts "---"
-        page.puts "{% include JB/setup %}"
-        page.puts "{% include themes/#{theme_name}/#{File.basename(filename)} %}" 
-      end
-    end
-    
-    puts "=> Theme successfully switched!"
-    puts "=> Reload your web-page to check it out =)"
-  end # task :switch
-  
-  # Public: Install a theme using the theme packager.
-  # Version 0.1.0 simple 1:1 file matching.
-  #
-  # git  - String, Optional path to the git repository of the theme to be installed.
-  # name - String, Optional name of the theme you want to install.
-  #        Passing name requires that the theme package already exist.
-  #
-  # Examples
-  #
-  #   rake theme:install git="https://github.com/jekyllbootstrap/theme-twitter.git"
-  #   rake theme:install name="cool-theme"
-  #
-  # Returns Success/failure messages.
-  desc "Install theme"
-  task :install do
-    if ENV["git"]
-      manifest = theme_from_git_url(ENV["git"])
-      name = manifest["name"]
-    else
-      name = ENV["name"].to_s.downcase
-    end
-
-    packaged_theme_path = JB::Path.build(:theme_packages, :node => name)
-    
-    abort("rake aborted!
-      => ERROR: 'name' cannot be blank") if name.empty?
-    abort("rake aborted! 
-      => ERROR: '#{packaged_theme_path}' directory not found.
-      => Installable themes can be added via git. You can find some here: http://github.com/jekyllbootstrap
-      => To download+install run: `rake theme:install git='[PUBLIC-CLONE-URL]'`
-      => example : rake theme:install git='git@github.com:jekyllbootstrap/theme-the-program.git'
-    ") unless FileTest.directory?(packaged_theme_path)
-    
-    manifest = verify_manifest(packaged_theme_path)
-    
-    # Get relative paths to packaged theme files
-    # Exclude directories as they'll be recursively created. Exclude meta-data files.
-    packaged_theme_files = []
-    FileUtils.cd(packaged_theme_path) {
-      Dir.glob("**/*.*") { |f| 
-        next if ( FileTest.directory?(f) || f =~ /^(manifest|readme|packager)/i )
-        packaged_theme_files << f 
-      }
-    }
-    
-    # Mirror each file into the framework making sure to prompt if already exists.
-    packaged_theme_files.each do |filename|
-      file_install_path = File.join(JB::Path.base, filename)
-      if File.exist? file_install_path and ask("#{file_install_path} already exists. Do you want to overwrite?", ['y', 'n']) == 'n'
-        next
-      else
-        mkdir_p File.dirname(file_install_path)
-        cp_r File.join(packaged_theme_path, filename), file_install_path
-      end
-    end
-    
-    puts "=> #{name} theme has been installed!"
-    puts "=> ---"
-    if ask("=> Want to switch themes now?", ['y', 'n']) == 'y'
-      system("rake switch_theme name='#{name}'")
-    end
-  end
-
-  # Public: Package a theme using the theme packager.
-  # The theme must be structured using valid JB API.
-  # In other words packaging is essentially the reverse of installing.
-  #
-  # name - String, Required name of the theme you want to package.
-  #        
-  # Examples
-  #
-  #   rake theme:package name="twitter"
-  #
-  # Returns Success/failure messages.
-  desc "Package theme"
-  task :package do
-    name = ENV["name"].to_s.downcase
-    theme_path = JB::Path.build(:themes, :node => name)
-    asset_path = JB::Path.build(:theme_assets, :node => name)
-
-    abort("rake aborted: name cannot be blank") if name.empty?
-    abort("rake aborted: '#{theme_path}' directory not found.") unless FileTest.directory?(theme_path)
-    abort("rake aborted: '#{asset_path}' directory not found.") unless FileTest.directory?(asset_path)
-    
-    ## Mirror theme's template directory (_includes)
-    packaged_theme_path = JB::Path.build(:themes, :root => JB::Path.build(:theme_packages, :node => name))
-    mkdir_p packaged_theme_path
-    cp_r theme_path, packaged_theme_path
-    
-    ## Mirror theme's asset directory
-    packaged_theme_assets_path = JB::Path.build(:theme_assets, :root => JB::Path.build(:theme_packages, :node => name))
-    mkdir_p packaged_theme_assets_path
-    cp_r asset_path, packaged_theme_assets_path
-
-    ## Log packager version
-    packager = {"packager" => {"version" => CONFIG["theme_package_version"].to_s } }
-    open(JB::Path.build(:theme_packages, :node => "#{name}/packager.yml"), "w") do |page|
-      page.puts packager.to_yaml
-    end
-    
-    puts "=> '#{name}' theme is packaged and available at: #{JB::Path.build(:theme_packages, :node => name)}"
-  end
-  
-end # end namespace :theme
-
-# Internal: Download and process a theme from a git url.
-# Notice we don't know the name of the theme until we look it up in the manifest.
-# So we'll have to change the folder name once we get the name.
-#
-# url - String, Required url to git repository.
-#        
-# Returns theme manifest hash
-def theme_from_git_url(url)
-  tmp_path = JB::Path.build(:theme_packages, :node => "_tmp")
-  abort("rake aborted: system call to git clone failed") if !system("git clone #{url} #{tmp_path}")
-  manifest = verify_manifest(tmp_path)
-  new_path = JB::Path.build(:theme_packages, :node => manifest["name"])
-  if File.exist?(new_path) && ask("=> #{new_path} theme package already exists. Override?", ['y', 'n']) == 'n'
-    remove_dir(tmp_path)
-    abort("rake aborted: '#{manifest["name"]}' already exists as theme package.")
-  end
-
-  remove_dir(new_path) if File.exist?(new_path)
-  mv(tmp_path, new_path)
-  manifest
-end
-
-# Internal: Process theme package manifest file.
-#
-# theme_path - String, Required. File path to theme package.
-#        
-# Returns theme manifest hash
-def verify_manifest(theme_path)
-  manifest_path = File.join(theme_path, "manifest.yml")
-  manifest_file = File.open( manifest_path )
-  abort("rake aborted: repo must contain valid manifest.yml") unless File.exist? manifest_file
-  manifest = YAML.load( manifest_file )
-  manifest_file.close
-  manifest
-end
-
-def ask(message, valid_options)
-  if valid_options
-    answer = get_stdin("#{message} #{valid_options.to_s.gsub(/"/, '').gsub(/, /,'/')} ") while !valid_options.include?(answer)
-  else
-    answer = get_stdin(message)
-  end
-  answer
-end
-
-def get_stdin(message)
-  print message
-  STDIN.gets.chomp
-end
-
-#Load custom rake scripts
-Dir['_rake/*.rake'].each { |r| load r }
diff --git a/_config.yml b/_config.yml
deleted file mode 100644
index bce9fe3..0000000
--- a/_config.yml
+++ /dev/null
@@ -1,138 +0,0 @@
-# This is the default format.
-# For more see: http://jekyllrb.com/docs/permalinks/
-permalink: /:categories/:year/:month/:day/:title
-
-exclude: [".rvmrc", ".rbenv-version", "README.md", "Rakefile", "changelog.md"]
-
-markdown: redcarpet
-
-
-redcarpet:
-  extensions: ["fenced_code_blocks", "no_intra_emphasis", "tables", "autolink", "strikethrough", "with_toc_data"]
-
-# Themes are encouraged to use these universal variables
-# so be sure to set them if your theme uses them.
-#
-title :  CSE599 Deep Learning System
-tagline:
-author :
-  name : cse599
-
-
-
-# The production_url is only used when full-domain names are needed
-# such as sitemap.txt
-# Most places will/should use BASE_PATH to make the urls
-#
-# If you have set a CNAME (pages.github.com) set your custom domain here.
-# Else if you are pushing to username.github.io, replace with your username.
-# Finally if you are pushing to a GitHub project page, include the project name at the end.
-#
-production_url : dlsys-course.github.io
-
-
-# All Jekyll-Bootstrap specific configurations are namespaced into this hash
-#
-JB :
-  version : 0.3.0
-
-  # All links will be namespaced by BASE_PATH if defined.
-  # Links in your website should always be prefixed with {{BASE_PATH}}
-  # however this value will be dynamically changed depending on your deployment situation.
-  #
-  # CNAME (http://yourcustomdomain.com)
-  #   DO NOT SET BASE_PATH
-  #   (urls will be prefixed with "/" and work relatively)
-  #
-  # GitHub Pages (http://username.github.io)
-  #   DO NOT SET BASE_PATH
-  #   (urls will be prefixed with "/" and work relatively)
-  #
-  # GitHub Project Pages (http://username.github.io/project-name)
-  #
-  #   A GitHub Project site exists in the `gh-pages` branch of one of your repositories.
-  #  REQUIRED! Set BASE_PATH to: http://username.github.io/project-name
-  #
-  # CAUTION:
-  #   - When in Localhost, your site will run from root "/" regardless of BASE_PATH
-  #   - Only the following values are falsy: ["", null, false]
-  #   - When setting BASE_PATH it must be a valid url.
-  #     This means always setting the protocol (http|https) or prefixing with "/"
-  BASE_PATH : http://dlsys-course.github.io
-
-  BLOG_PATH : http://dlsys-course.github.io
-
-  # By default, the asset_path is automatically defined relative to BASE_PATH plus the enabled theme.
-  # ex: [BASE_PATH]/assets/themes/[THEME-NAME]
-  #
-  # Override this by defining an absolute path to assets here.
-  # ex:
-  #   http://s3.amazonaws.com/yoursite/themes/watermelon
-  #   /assets
-  #
-
-  # These paths are to the main pages Jekyll-Bootstrap ships with.
-  # Some JB helpers refer to these paths; change them here if needed.
-  #
-  archive_path: /archive.html
-  categories_path : /categories.html
-  tags_path : /tags.html
-  atom_path : /atom.xml
-  rss_path : /rss.xml
-
-  # Settings for comments helper
-  # Set 'provider' to the comment provider you want to use.
-  # Set 'provider' to false to turn commenting off globally.
-  #
-  comments :
-    provider : disqus
-    disqus :
-      short_name : jekyllbootstrap
-    livefyre :
-      site_id : 123
-    intensedebate :
-      account : 123abc
-    facebook :
-      appid : 123
-      num_posts: 5
-      width: 580
-      colorscheme: light
-    duoshuo :
-      short_name : jekyllbootstrap
-
-  # Settings for analytics helper
-  # Set 'provider' to the analytics provider you want to use.
-  # Set 'provider' to false to turn analytics off globally.
-  #
-  analytics :
-    provider : google
-    gauges :
-        site_id : 'SITE ID'
-    google :
-        tracking_id : 'UA-123-12'
-    getclicky :
-      site_id :
-    mixpanel :
-        token : '_MIXPANEL_TOKEN_'
-    piwik :
-        baseURL : 'myserver.tld/piwik' # Piwik installation address (without protocol)
-        idsite : '1'                   # the id of the site on Piwik
-
-  # Settings for sharing helper.
-  # Sharing is for things like tweet, plusone, like, reddit buttons etc.
-  # Set 'provider' to the sharing provider you want to use.
-  # Set 'provider' to false to turn sharing off globally.
-  #
-  sharing :
-    provider : false
-
-  # Settings for all other include helpers can be defined by creating
-  # a hash with key named for the given helper. ex:
-  #
-  # pages_list :
-  #     provider : "custom"
-  #
-  # Setting any helper's provider to 'custom' will bypass the helper code
-  # and include your custom code. Your custom file must be defined at:
-  #   ./_includes/custom/[HELPER]
-  # where [HELPER] is the name of the helper you are overriding.
diff --git a/_includes/JB/analytics b/_includes/JB/analytics
deleted file mode 100644
index 2bb4c80..0000000
--- a/_includes/JB/analytics
+++ /dev/null
@@ -1,20 +0,0 @@
-{% include JB/is_production %}
-
-{% if is_production and site.JB.analytics.provider and page.JB.analytics != false %}
-
-{% case site.JB.analytics.provider %}
-{% when "gauges" %}
-  {% include JB/analytics-providers/gauges %}
-{% when "google" %}
-  {% include JB/analytics-providers/google %}
-{% when "getclicky" %}
-  {% include JB/analytics-providers/getclicky %}
-{% when "mixpanel" %}
-  {% include JB/analytics-providers/mixpanel %}
-{% when "piwik" %}
-  {% include JB/analytics-providers/piwik %}
-{% when "custom" %}
-  {% include custom/analytics %}
-{% endcase %}
-
-{% endif %}
diff --git a/_includes/JB/analytics-providers/gauges b/_includes/JB/analytics-providers/gauges
deleted file mode 100644
index b793ff1..0000000
--- a/_includes/JB/analytics-providers/gauges
+++ /dev/null
@@ -1,13 +0,0 @@
-<script type="text/javascript">
-  var _gauges = _gauges || [];
-  (function() {
-    var t   = document.createElement('script');
-    t.type  = 'text/javascript';
-    t.async = true;
-    t.id    = 'gauges-tracker';
-    t.setAttribute('data-site-id', '{{ site.JB.analytics.gauges.site_id }}');
-    t.src = '//secure.gaug.es/track.js';
-    var s = document.getElementsByTagName('script')[0];
-    s.parentNode.insertBefore(t, s);
-  })();
-</script>
diff --git a/_includes/JB/analytics-providers/getclicky b/_includes/JB/analytics-providers/getclicky
deleted file mode 100644
index e9462f4..0000000
--- a/_includes/JB/analytics-providers/getclicky
+++ /dev/null
@@ -1,12 +0,0 @@
-<script type="text/javascript">
-var clicky_site_ids = clicky_site_ids || [];
-clicky_site_ids.push({{ site.JB.analytics.getclicky.site_id }});
-(function() {
-  var s = document.createElement('script');
-  s.type = 'text/javascript';
-  s.async = true;
-  s.src = '//static.getclicky.com/js';
-  ( document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0] ).appendChild( s );
-})();
-</script>
-<noscript><p><img alt="Clicky" width="1" height="1" src="//in.getclicky.com/{{ site.JB.analytics.getclicky.site_id }}ns.gif" /></p></noscript>
diff --git a/_includes/JB/analytics-providers/google b/_includes/JB/analytics-providers/google
deleted file mode 100644
index eca6d9c..0000000
--- a/_includes/JB/analytics-providers/google
+++ /dev/null
@@ -1,13 +0,0 @@
-<script type="text/javascript">
-  var _gaq = _gaq || [];
-  _gaq.push(['_setAccount', '{{ site.JB.analytics.google.tracking_id }}']);
-  _gaq.push(['_trackPageview']);
-
-  (function() {
-    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
-    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
-    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
-  })();
-</script>
-
-
diff --git a/_includes/JB/analytics-providers/google-universal b/_includes/JB/analytics-providers/google-universal
deleted file mode 100644
index 834f2ee..0000000
--- a/_includes/JB/analytics-providers/google-universal
+++ /dev/null
@@ -1,9 +0,0 @@
-<script>
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-  ga('create', {{ site.JB.analytics.googleUA.tracking_id }}', {% if site.JB.analytics.googleUA.property_name %}{{ site.JB.analytics.googleUA.property_name }}{% else %}'auto'{% endif %});
-  ga('send', 'pageview');
-</script>
\ No newline at end of file
diff --git a/_includes/JB/analytics-providers/mixpanel b/_includes/JB/analytics-providers/mixpanel
deleted file mode 100644
index 4406eb0..0000000
--- a/_includes/JB/analytics-providers/mixpanel
+++ /dev/null
@@ -1,11 +0,0 @@
-<script type="text/javascript">
-    var mpq = [];
-    mpq.push(["init", "{{ site.JB.analytics.mixpanel.token}}"]);
-    (function(){var b,a,e,d,c;b=document.createElement("script");b.type="text/javascript";
-    b.async=true;b.src=(document.location.protocol==="https:"?"https:":"http:")+
-    "//api.mixpanel.com/site_media/js/api/mixpanel.js";a=document.getElementsByTagName("script")[0];
-    a.parentNode.insertBefore(b,a);e=function(f){return function(){mpq.push(
-    [f].concat(Array.prototype.slice.call(arguments,0)))}};d=["init","track","track_links",
-    "track_forms","register","register_once","identify","name_tag","set_config"];for(c=0;c<
-    d.length;c++){mpq[d[c]]=e(d[c])}})();
-</script>
\ No newline at end of file
diff --git a/_includes/JB/analytics-providers/piwik b/_includes/JB/analytics-providers/piwik
deleted file mode 100755
index 077a373..0000000
--- a/_includes/JB/analytics-providers/piwik
+++ /dev/null
@@ -1,10 +0,0 @@
-<script type="text/javascript">
-  var pkBaseURL = (("https:" == document.location.protocol) ? "https://{{ site.JB.analytics.piwik.baseURL }}/" : "http://{{ site.JB.analytics.piwik.baseURL }}/");
-  document.write(unescape("%3Cscript src='" + pkBaseURL + "piwik.js' type='text/javascript'%3E%3C/script%3E"));
-</script><script type="text/javascript">
-  try {
-    var piwikTracker = Piwik.getTracker(pkBaseURL + "piwik.php", {{ site.JB.analytics.piwik.idsite }});
-    piwikTracker.trackPageView();
-    piwikTracker.enableLinkTracking();
-  } catch( err ) {}
-</script><noscript><p><img src="http://{{ site.JB.analytics.piwik.baseURL }}/piwik.php?idsite={{ site.JB.analytics.piwik.idsite }}" style="border:0" alt="" /></p></noscript>
\ No newline at end of file
diff --git a/_includes/JB/categories_list b/_includes/JB/categories_list
deleted file mode 100644
index 83be2e2..0000000
--- a/_includes/JB/categories_list
+++ /dev/null
@@ -1,37 +0,0 @@
-{% comment %}<!--
-The categories_list include is a listing helper for categories.
-Usage:
-  1) assign the 'categories_list' variable to a valid array of tags.
-  2) include JB/categories_list
-  example:
-    <ul>
-  	  {% assign categories_list = site.categories %}  
-  	  {% include JB/categories_list %}
-  	</ul>
-  
-  Notes: 
-    Categories can be either a Hash of Category objects (hashes) or an Array of category-names (strings).
-    The encapsulating 'if' statement checks whether categories_list is a Hash or Array.
-    site.categories is a Hash while page.categories is an array.
-    
-  This helper can be seen in use at: ../_layouts/default.html
--->{% endcomment %}
-
-{% if site.JB.categories_list.provider == "custom" %}
-  {% include custom/categories_list %}
-{% else %}
-  {% if categories_list.first[0] == null %}
-    {% for category in categories_list %} 
-    	<li><a href="{{ BASE_PATH }}{{ site.JB.categories_path }}#{{ category }}-ref">
-    		{{ category | join: "/" }} <span>{{ site.categories[category].size }}</span>
-    	</a></li>
-    {% endfor %}
-  {% else %}
-    {% for category in categories_list %} 
-    	<li><a href="{{ BASE_PATH }}{{ site.JB.categories_path }}#{{ category[0] }}-ref">
-    		{{ category[0] | join: "/" }} <span>{{ category[1].size }}</span>
-    	</a></li>
-    {% endfor %}
-  {% endif %}
-{% endif %}
-{% assign categories_list = nil %}
\ No newline at end of file
diff --git a/_includes/JB/comments b/_includes/JB/comments
deleted file mode 100644
index eec2e1e..0000000
--- a/_includes/JB/comments
+++ /dev/null
@@ -1,18 +0,0 @@
-{% if site.JB.comments.provider and page.comments != false %}
-
-{% case site.JB.comments.provider %}
-{% when "disqus" %}
-  {% include JB/comments-providers/disqus %}
-{% when "livefyre" %}
-  {% include JB/comments-providers/livefyre %}
-{% when "intensedebate" %}
-  {% include JB/comments-providers/intensedebate %}
-{% when "facebook" %}
-  {% include JB/comments-providers/facebook %}
-{% when "duoshuo" %}
-  {% include JB/comments-providers/duoshuo %}
-{% when "custom" %}
-  {% include custom/comments %}
-{% endcase %}
-
-{% endif %}
\ No newline at end of file
diff --git a/_includes/JB/comments-providers/disqus b/_includes/JB/comments-providers/disqus
deleted file mode 100644
index 6343100..0000000
--- a/_includes/JB/comments-providers/disqus
+++ /dev/null
@@ -1,15 +0,0 @@
-<div id="disqus_thread"></div>
-<script type="text/javascript">
-    {% include JB/is_production %}
-    {% if is_production == false %}var disqus_developer = 1;{% endif %}
-    var disqus_shortname = '{{ site.JB.comments.disqus.short_name }}'; // required: replace example with your forum shortname
-    {% if page.wordpress_id %}var disqus_identifier = '{{page.wordpress_id}} {{site.production_url}}/?p={{page.wordpress_id}}';{% endif %}
-    /* * * DON'T EDIT BELOW THIS LINE * * */
-    (function() {
-        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
-        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
-        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
-    })();
-</script>
-<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
-<a href="http://disqus.com" class="dsq-brlink">blog comments powered by <span class="logo-disqus">Disqus</span></a>
diff --git a/_includes/JB/comments-providers/duoshuo b/_includes/JB/comments-providers/duoshuo
deleted file mode 100644
index 90865a0..0000000
--- a/_includes/JB/comments-providers/duoshuo
+++ /dev/null
@@ -1,14 +0,0 @@
-<!-- Duoshuo Comment BEGIN -->
-  <div class="ds-thread"{% if page.wordpress_id %} data-thread-key="{{page.wordpress_id}}"{% endif %}></div>
-<script type="text/javascript">
-var duoshuoQuery = {short_name:'{{ site.JB.comments.duoshuo.short_name }}'};
-  (function() {
-    var ds = document.createElement('script');
-    ds.type = 'text/javascript';ds.async = true;
-    ds.src = 'http://static.duoshuo.com/embed.js';
-    ds.charset = 'UTF-8';
-    (document.getElementsByTagName('head')[0] 
-    || document.getElementsByTagName('body')[0]).appendChild(ds);
-  })();
-  </script>
-<!-- Duoshuo Comment END -->
diff --git a/_includes/JB/comments-providers/facebook b/_includes/JB/comments-providers/facebook
deleted file mode 100644
index e1d3deb..0000000
--- a/_includes/JB/comments-providers/facebook
+++ /dev/null
@@ -1,9 +0,0 @@
-<div id="fb-root"></div>
-<script>(function(d, s, id) {
-  var js, fjs = d.getElementsByTagName(s)[0];
-  if (d.getElementById(id)) return;
-  js = d.createElement(s); js.id = id;
-  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId={{ site.JB.comments.facebook.appid }}";
-  fjs.parentNode.insertBefore(js, fjs);
-}(document, 'script', 'facebook-jssdk'));</script>
-<div class="fb-comments" data-href="{{ site.production_url }}{{ page.url }}" data-num-posts="{{ site.JB.comments.facebook.num_posts }}" data-width="{{ site.JB.comments.facebook.width }}" data-colorscheme="{{ site.JB.comments.facebook.colorscheme }}"></div>
\ No newline at end of file
diff --git a/_includes/JB/comments-providers/intensedebate b/_includes/JB/comments-providers/intensedebate
deleted file mode 100644
index 233ce34..0000000
--- a/_includes/JB/comments-providers/intensedebate
+++ /dev/null
@@ -1,6 +0,0 @@
-<script>
-var idcomments_acct = '{{ site.JB.comments.intensedebate.account }}';
-var idcomments_post_id;
-var idcomments_post_url;
-</script>
-<script type="text/javascript" src="http://www.intensedebate.com/js/genericCommentWrapperV2.js"></script>
diff --git a/_includes/JB/comments-providers/livefyre b/_includes/JB/comments-providers/livefyre
deleted file mode 100644
index 704b803..0000000
--- a/_includes/JB/comments-providers/livefyre
+++ /dev/null
@@ -1,6 +0,0 @@
-<script type='text/javascript' src='http://zor.livefyre.com/wjs/v1.0/javascripts/livefyre_init.js'></script>
-<script type='text/javascript'>
-    var fyre = LF({
-        site_id: {{ site.JB.comments.livefyre.site_id }}
-    });
-</script>
\ No newline at end of file
diff --git a/_includes/JB/feedburner b/_includes/JB/feedburner
deleted file mode 100644
index 6dba603..0000000
--- a/_includes/JB/feedburner
+++ /dev/null
@@ -1,3 +0,0 @@
-{% if site.author.feedburner != null %}
-<link href="http://feeds.feedburner.com/{{ site.author.feedburner }}" rel="alternate" title="{{ site.title }}" type="application/atom+xml" />
-{% endif %}
diff --git a/_includes/JB/file_exists b/_includes/JB/file_exists
deleted file mode 100644
index f40080f..0000000
--- a/_includes/JB/file_exists
+++ /dev/null
@@ -1,26 +0,0 @@
-{% comment %}<!--
-  param:  file = "/example/file.png"
-  return: file_exists_result = true
-  
-  examples:
-    {% include JB/file_exists file="/404.html" %}
-    {% if file_exists_result %}Found "/404.html"!{% else %}Did not find "/404.html".{% endif %}
-
-    {% assign filename = "/405.html" %}
-    {% include JB/file_exists file=filename %}
-    {% if file_exists_result %}Found "{{ filename }}"!{% else %}Did not find "{{ filename }}".{% endif %}
-
-  NOTE: the BREAK statement in the FOR loop assumes Liquid >= 2.5.0
-  
--->{% endcomment %}
-
-{% assign file_exists_result = false %}
-
-{% if include.file %}
-	{% for static_file in site.static_files %}
-		{% if static_file.path == include.file %}
-			{% assign file_exists_result = true %}
-			{% break %}
-		{% endif %}
-	{% endfor %}
-{% endif %}
diff --git a/_includes/JB/gist b/_includes/JB/gist
deleted file mode 100644
index 38a5b1c..0000000
--- a/_includes/JB/gist
+++ /dev/null
@@ -1,19 +0,0 @@
-{% comment %}<!--
-The gist include allows you to embed GitHub Gist snippets in your content.
-Usage:
-  1) include JB/gist
-  2) specify the gist_id parameter (REQUIRED)
-  3) specify the gist_file parameter (OPTIONAL)
-  example:
-    <ul>
-  	  {% include JB/gist gist_id="fdcfeaba4f33c172828d" %}
-  	  {% include JB/gist gist_id="fdcfeaba4f33c172828d" gist_file="jekyll-bootstrap.js" %}
-  	</ul>
--->{% endcomment %}
-
-<div id="gist">
-<script src="https://gist.github.com/{{ include.gist_id }}.js{% if include.gist_file %}?file={{ include.gist_file }}{% endif %}"></script>
-<noscript>
-<pre>https://gist.github.com/{{include.gist_id}}.js{% if include.gist_file %}?file={{include.gist_file}}{% endif %}</pre>
-</noscript>
-</div>
diff --git a/_includes/JB/is_production b/_includes/JB/is_production
deleted file mode 100644
index eb5f916..0000000
--- a/_includes/JB/is_production
+++ /dev/null
@@ -1,43 +0,0 @@
-{% capture jbcache %}{% comment %}
-
-  Determine whether or not the site is being built in a production environment.
-
-  Parameters:
-    None.
-
-  Returns:
-    is_production: [true|false]
-    jb_prod_env: [development|github|other]
-
-  Examples:
-
-    {% include JB/is_production %}
-
-    {% if is_production != true %}
-      <h3>This is Private</h3>
-      <p>I love to watch television in my undies. Don't tell anyone!</p>
-    {% endif %}
-
-    <h3>This is Public</h3>
-    <p>I have no unusual quirks.</p>
-
-{% endcomment %}
-
-{% assign is_production = false %}
-{% assign jb_prod_env = "development" %}
-
-{% if jekyll.environment != "development" %}
-  {% assign is_production = true %}
-  {% assign jb_prod_env = jekyll.environment %}
-{% endif %}
-
-{% if site.github %}
-  {% assign is_production = true %}
-  {% assign jb_prod_env = "github" %}
-{% endif %}
-
-{% if site.safe %}
-  {% assign is_production = true %}
-{% endif %}
-
-{% endcapture %}{% assign jbcache = nil %}
\ No newline at end of file
diff --git a/_includes/JB/liquid_raw b/_includes/JB/liquid_raw
deleted file mode 100644
index da2d359..0000000
--- a/_includes/JB/liquid_raw
+++ /dev/null
@@ -1,32 +0,0 @@
-{% comment%}<!--
-The liquid_raw helper is a way to display raw liquid code, as opposed to parsing it.
-Normally you'd use Liquid's built in 'raw' tag. 
-The problem is GitHub Jekyll does not support the current Liquid release.
-GitHub Jekyll supports the deprecated 'literal' tag.
-Using one will break the other if you plan to deploy to GitHub pages.
-  see: https://github.com/mojombo/jekyll/issues/425
-
-Since I don't want to mess with Liquid versions, I'll just rewrite the way I 
-intend to give liquid examples. It's not an elegant solution by any means:
-
-Usage: 
-  1) Define a 'text' variable with the block of liquid code you intend to display.
-  2) Pass the text variable to include JB/liquid_raw
-
-  example:
-  {% capture text %}|.% for tag in tags_list %.|
-    <li><a href="|.{ site.var.tags_path }.||.{ tag[0] }.|-ref">|.{ tag[0] }.| <span>|.{tag[1].size}.|</span></a></li>
-  |.% endfor %.|
-
-  |.% assign tags_list = null %.|{% endcapture %}    
-  {% include JB/liquid_raw %}
-  
-  As seen here, you must use "|." and ".|" as opening and closing brackets.
--->{% endcomment%}
-
-{% if site.JB.liquid_raw.provider == "custom" %}
-  {% include custom/liquid_raw %}
-{% else %}
-  <pre><code>{{text | replace:"|.", "&#123;" | replace:".|", "&#125;" | replace:">", "&gt;" | replace:"<", "&lt;" }}</code></pre>
-{% endif %}
-{% assign text = nil %}
\ No newline at end of file
diff --git a/_includes/JB/pages_list b/_includes/JB/pages_list
deleted file mode 100644
index b3d2247..0000000
--- a/_includes/JB/pages_list
+++ /dev/null
@@ -1,47 +0,0 @@
-{% comment %}<!--
-The pages_list include is a listing helper.
-Usage:
-  1) assign the 'pages_list' variable to a valid array of pages or posts.
-  2) include JB/pages_list
-  example:
-    <ul>
-  	  {% assign pages_list = site.pages %}
-  	  {% include JB/pages_list %}
-  	</ul>
-
-  Grouping: (optional):
-  	assign the 'group' variable to constrain the list to only pages/posts
-  	in the given group. Note you must define the group manually in the page/post
-  	meta-data to use this feature.
-  	Grouping is mainly helpful for non-post pages.
-  	If you want to group posts, it's easier/better to tag them, then pass the tagged posts array.
-  	i.e. site.tags.cool_tag (this returns an array of posts tagged: cool_tag)
-
-  This helper can be seen in use at: ../_layouts/default.html
--->{% endcomment %}
-
-{% assign pages_list = site.pages | sort:"order" %}
-{% if site.JB.pages_list.provider == "custom" %}
-  {% include custom/pages_list %}
-{% else %}
-  {% for node in pages_list %}
-    {% if node.title != null %}
-      {% if group == null or group == node.group %}
-      	{% if page.url == node.url %}
-      	<li class="active"><a href="{{ HOME_PATH }}{{node.url}}" class="active">{{node.title}}</a></li>
-      	{% else %}
-      	<li><a href="{{ HOME_PATH }}{{node.url}}">{{node.title}}</a></li>
-      	{% endif %}
-      {% endif %}
-      {% if node.group == 'blog' %}
-      	{% if page.url == node.url %}
-      	<li class="active"><a href="{{ BLOG_PATH }}{{node.url}}" class="active">{{node.title}}</a></li>
-      	{% else %}
-      	<li><a href="{{ BLOG_PATH }}{{node.url}}">{{node.title}}</a></li>
-      	{% endif %}
-      {% endif %}
-    {% endif %}
-  {% endfor %}
-{% endif %}
-{% assign pages_list = nil %}
-{% assign group = nil %}
diff --git a/_includes/JB/posts_collate b/_includes/JB/posts_collate
deleted file mode 100644
index 35b9a78..0000000
--- a/_includes/JB/posts_collate
+++ /dev/null
@@ -1,55 +0,0 @@
-{% comment %}<!--
-Collate_posts helper. Collated posts by year and month.
-Usage:
-  1) assign the 'posts_collate' variable to a valid array of posts.
-  2) include JB/posts_collate
-  example:
-    {% assign posts_collate = site.posts %}
-    {% include JB/posts_collate %}
-
-  Ordering:
-    Posts are displayed in reverse chronological order.
-    For normal chronological order:
-      1) Change the for loop to this:
-        => 'for post in site.posts reversed'
-      2) Next make sure to change 'post.previous.date' to:
-        => 'post.next.date'
-
--->{% endcomment %}
-
-{% if site.JB.posts_collate.provider == "custom" %}
-  {% include custom/posts_collate %}
-{% else %}
-  {% for post in posts_collate  %}
-    {% capture this_year %}{{ post.date | date: "%Y" }}{% endcapture %}
-    {% capture this_month %}{{ post.date | date: "%B" }}{% endcapture %}
-    {% capture next_year %}{{ post.previous.date | date: "%Y" }}{% endcapture %}
-    {% capture next_month %}{{ post.previous.date | date: "%B" }}{% endcapture %}
-
-    {% if forloop.first %}
-      <h2>{{this_year}}</h2>
-      <h3>{{this_month}}</h3>
-      <ul>
-    {% endif %}
-
-    <li><span>{{ post.date | date: "%B %e, %Y" }}</span> &raquo; <a href="{{ BASE_PATH }}{{ post.url }}.html">{{ post.title }}</a></li>
-
-    {% if forloop.last %}
-      </ul>
-    {% else %}
-      {% if this_year != next_year %}
-        </ul>
-        <h2>{{next_year}}</h2>
-        <h3>{{next_month}}</h3>
-        <ul>
-      {% else %}
-        {% if this_month != next_month %}
-          </ul>
-          <h3>{{next_month}}</h3>
-          <ul>
-        {% endif %}
-      {% endif %}
-    {% endif %}
-  {% endfor %}
-{% endif %}
-{% assign posts_collate = nil %}
\ No newline at end of file
diff --git a/_includes/JB/setup b/_includes/JB/setup
deleted file mode 100644
index 5de497a..0000000
--- a/_includes/JB/setup
+++ /dev/null
@@ -1,31 +0,0 @@
-{% capture jbcache %}
-  <!--
-  - Dynamically set liquid variables for working with URLs/paths
-  -->
-  {% include JB/is_production %}
-  {% if site.JB.setup.provider == "custom" %}
-    {% include custom/setup %}
-  {% else %}
-
-    {% if is_production and site.JB.BASE_PATH and site.JB.BASE_PATH != '' %}
-      {% assign BASE_PATH = site.JB.BASE_PATH %}
-      {% assign HOME_PATH = site.JB.BASE_PATH %}
-      {% assign BLOG_PATH = site.JB.BLOG_PATH %}
-    {% else %}
-       {% assign BASE_PATH = nil %}
-       {% assign HOME_PATH = '' %}
-       {% assign BLOG_PATH = site.JB.BLOG_PATH %}
-    {% endif %}
-
-    {% if site.github %}
-       {% assign BASE_PATH = nil %}
-       {% assign BLOG_PATH = nil %}
-    {% endif %}
-
-    {% if site.JB.ASSET_PATH %}
-      {% assign ASSET_PATH = site.JB.ASSET_PATH %}
-    {% else %}
-      {% capture ASSET_PATH %}{{ BASE_PATH }}/assets/themes/{{ layout.theme.name }}{% endcapture %}
-    {% endif %}
-  {% endif %}
-{% endcapture %}{% assign jbcache = nil %}
\ No newline at end of file
diff --git a/_includes/JB/sharing b/_includes/JB/sharing
deleted file mode 100644
index 175a001..0000000
--- a/_includes/JB/sharing
+++ /dev/null
@@ -1,9 +0,0 @@
-{% include JB/is_production %}
-{% if is_production and site.JB.sharing.provider and page.JB.sharing != false %}
-
-{% case site.JB.sharing.provider %}
-{% when "custom" %}
-  {% include custom/sharing %}
-{% endcase %}
-
-{% endif %}
\ No newline at end of file
diff --git a/_includes/JB/sort_collection b/_includes/JB/sort_collection
deleted file mode 100644
index 1e32015..0000000
--- a/_includes/JB/sort_collection
+++ /dev/null
@@ -1,81 +0,0 @@
-{% capture jbcache %}{% comment %}
-
-  Sort the given array or map.
-  
-  Parameters:
-    collection: the array or map to sort [REQUIRED]
-    sort_by: the property to sort by [OPTIONAL]
-    sort_descending: reverse the collection [OPTIONAL]
-
-  Returns:
-    sort_result: the sorted collection
-
-  Examples:
-    <h3>Pages</h3>
-    <ol>
-      {% include JB/sort_collection collection=site.pages sort_by="title" %}
-      {% assign pages_list = sort_result %}
-      {% include JB/pages_list %}
-    </ol>
-
-    <h3>Pages [Reversed]</h3>
-    <ol>
-      {% include JB/sort_collection collection=site.pages sort_by="title" sort_descending=true %}
-      {% assign pages_list = sort_result %}
-      {% include JB/pages_list %}
-    </ol>
-
-    <h3>Array</h3>
-    <ol>
-      {% assign test_array = "one,two,three,four" | split: "," %}
-      {% include JB/sort_collection collection=test_array %}
-      {% for test in sort_result %}
-        <li>{{test}}</li>
-      {% endfor %}
-    </ol>
-
-    <h3>Array [Reversed]</h3>
-    <ol>
-      {% assign test_array = "one,two,three,four" | split: "," %}
-      {% include JB/sort_collection collection=test_array sort_descending=true %}
-      {% for test in sort_result %}
-        <li>{{test}}</li>
-      {% endfor %}
-    </ol>
-
-{% endcomment %}
-
-{% assign is_array = true %}
-{% assign sort_result = "," | split: "," %}
-{% assign collection = include.collection %}
-{% if include.sort_by %}
-  {% assign sort_by = include.sort_by %}
-{% else %}
-  {% assign sort_by = "title" %}
-{% endif %}
-
-{% if collection and collection.size > 0 %}
-  {% for x in collection.first %}
-    {% if x[1].size > 0 %}
-      {% assign is_array = false %}
-    {% endif %}
-    {% break %}
-  {% endfor %}
-
-  {% if is_array == false %}
-    {% assign sort_result = collection | sort: sort_by %}
-  {% else %}
-    {% assign sort_result = collection | sort %}
-  {% endif %}
-  
-  {% if include.sort_descending %}
-    {% assign reversed = "," | split: "," %}
-    {% for index in (1..sort_result.size) %}
-      {% assign i = sort_result.size | minus: index %}
-      {% assign reversed = reversed | push: sort_result[i] %}
-    {% endfor %}
-    {% assign sort_result = reversed %}
-    {% assign reversed = nil %}
-  {% endif %}
-
-{% endif %}{% endcapture %}{% assign jbcache = nil %}
\ No newline at end of file
diff --git a/_includes/JB/tags_list b/_includes/JB/tags_list
deleted file mode 100644
index 8eb62a7..0000000
--- a/_includes/JB/tags_list
+++ /dev/null
@@ -1,33 +0,0 @@
-{% comment %}<!--
-The tags_list include is a listing helper for tags.
-Usage:
-  1) assign the 'tags_list' variable to a valid array of tags.
-  2) include JB/tags_list
-  example:
-    <ul>
-  	  {% assign tags_list = site.tags %}  
-  	  {% include JB/tags_list %}
-  	</ul>
-  
-  Notes: 
-    Tags can be either a Hash of tag objects (hashes) or an Array of tag-names (strings).
-    The encapsulating 'if' statement checks whether tags_list is a Hash or Array.
-    site.tags is a Hash while page.tags is an array.
-    
-  This helper can be seen in use at: ../_layouts/default.html
--->{% endcomment %}
-
-{% if site.JB.tags_list.provider == "custom" %}
-  {% include custom/tags_list %}
-{% else %}
-  {% if tags_list.first[0] == null %}
-    {% for tag in tags_list %} 
-    	<li><a href="{{ BASE_PATH }}{{ site.JB.tags_path }}#{{ tag }}-ref">{{ tag }} <span>{{ site.tags[tag].size }}</span></a></li>
-    {% endfor %}
-  {% else %}
-    {% for tag in tags_list %} 
-    	<li><a href="{{ BASE_PATH }}{{ site.JB.tags_path }}#{{ tag[0] }}-ref">{{ tag[0] }} <span>{{ tag[1].size }}</span></a></li>
-    {% endfor %}
-  {% endif %}
-{% endif %}
-{% assign tags_list = nil %}
diff --git a/_includes/custom/page_list b/_includes/custom/page_list
deleted file mode 100644
index fc49daf..0000000
--- a/_includes/custom/page_list
+++ /dev/null
@@ -1,4 +0,0 @@
- <ul>
-  	  {% assign pages_list = site.pages %}
-  	  {% include JB/pages_list %}
-  	</ul>
\ No newline at end of file
diff --git a/_includes/themes/custom-twitter/default.html b/_includes/themes/custom-twitter/default.html
deleted file mode 100644
index 8c12710..0000000
--- a/_includes/themes/custom-twitter/default.html
+++ /dev/null
@@ -1,64 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-  <head>
-    <meta charset="utf-8">
-    <title>{{ page.title }}</title>
-    {% if page.description %}<meta name="description" content="{{ page.description }}">{% endif %}
-    <meta name="author" content="{{ site.author.name }}">
-
-    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
-    <!--[if lt IE 9]>
-      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
-    <![endif]-->
-
-    <!-- Le styles -->
-    <link href="{{ ASSET_PATH }}/css/1.4.0/bootstrap.css" rel="stylesheet">
-    <link href="{{ ASSET_PATH }}/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
-
-    <!-- Le fav and touch icons -->
-  <!-- Update these with your own images
-    <link rel="shortcut icon" href="images/favicon.ico">
-    <link rel="apple-touch-icon" href="images/apple-touch-icon.png">
-    <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png">
-    <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png">
-  -->
-  <script>
-    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-    ga('create', 'UA-75982049-1', 'auto');
-    ga('send', 'pageview');
-  </script>
-  </head>
-
-  <body>
-
-    <div class="topbar">
-      <div class="fill">
-        <div class="container">
-          <a class="brand" href="{{ HOME_PATH }}">{{ site.title }}</a>
-          <ul class="nav">
-            {% assign pages_list = site.pages %}
-            {% assign group = 'navigation' %}
-            {% include JB/pages_list %}
-          </ul>
-        </div>
-      </div>
-    </div>
-
-    <div class="container">
-
-      <div class="content">
-        {{ content }}
-      </div>
-
-      <footer>
-      </footer>
-
-    </div> <!-- /container -->
-
-    {% include JB/analytics %}
-  </body>
-</html>
diff --git a/_includes/themes/custom-twitter/index.html b/_includes/themes/custom-twitter/index.html
deleted file mode 100644
index de87e4e..0000000
--- a/_includes/themes/custom-twitter/index.html
+++ /dev/null
@@ -1,6 +0,0 @@
-
-<div class="row">
-  <div class="span14">
-    {{ content }}
-  </div>
-</div>
diff --git a/_includes/themes/custom-twitter/page.html b/_includes/themes/custom-twitter/page.html
deleted file mode 100644
index 0ac5ee0..0000000
--- a/_includes/themes/custom-twitter/page.html
+++ /dev/null
@@ -1,9 +0,0 @@
-<div class="page-header">
-  <h1>{{ page.title }} </h1>
-</div>
-
-<div class="row">
-  <div class="span14">
-    {{ content }}
-  </div>
-</div>
diff --git a/_includes/themes/custom-twitter/post.html b/_includes/themes/custom-twitter/post.html
deleted file mode 100644
index 416af10..0000000
--- a/_includes/themes/custom-twitter/post.html
+++ /dev/null
@@ -1,40 +0,0 @@
-<div class="page-header">
-  <h1>{{ page.title }} <small>Supporting tagline</small></h1>
-</div>
-
-<div class="row">
-  <div class="span10">
-    {{ content }}
-    <hr>
-    <div class="pagination">
-      <ul>
-      {% if page.previous %}
-        <li class="prev"><a href="{{ BASE_PATH }}{{ page.previous.url }}" title="{{ page.previous.title }}">&larr; Previous</a></li>
-      {% else %}
-        <li class="prev disabled"><a>&larr; Previous</a></li>
-      {% endif %}
-        <li><a href="{{ BASE_PATH }}{{ site.JB.archive_path }}">Archive</a></li>
-      {% if page.next %}
-        <li class="next"><a href="{{ BASE_PATH }}{{ page.next.url }}" title="{{ page.next.title }}">Next &rarr;</a></li>
-      {% else %}
-        <li class="next disabled"><a>Next &rarr;</a>
-      {% endif %}
-      </ul>
-    </div>
-    <hr>
-    {% include JB/comments %}
-  </div>
-  
-  <div class="span4">
-    <h4>Published</h4>
-    <div class="date"><span>{{ page.date | date_to_long_string }}</span></div>
-
-  {% unless page.tags == empty %}
-    <h4>Tags</h4>
-    <ul class="tag_box">
-    {% assign tags_list = page.tags %}
-    {% include JB/tags_list %}
-    </ul>
-  {% endunless %}  
-  </div>
-</div>
diff --git a/_includes/themes/custom-twitter/settings.yml b/_includes/themes/custom-twitter/settings.yml
deleted file mode 100644
index 4b51704..0000000
--- a/_includes/themes/custom-twitter/settings.yml
+++ /dev/null
@@ -1,2 +0,0 @@
-theme :
-  name : custom-twitter
diff --git a/_layouts/default.html b/_layouts/default.html
deleted file mode 100644
index 9b01ea5..0000000
--- a/_layouts/default.html
+++ /dev/null
@@ -1,6 +0,0 @@
----
-theme :
-  name : custom-twitter
----
-{% include JB/setup %}
-{% include themes/custom-twitter/default.html %}
diff --git a/_layouts/index.html b/_layouts/index.html
deleted file mode 100644
index df36817..0000000
--- a/_layouts/index.html
+++ /dev/null
@@ -1,7 +0,0 @@
----
-theme :
-  name : custom-twitter
-layout: default
----
-{% include JB/setup %}
-{% include themes/custom-twitter/index.html %}
diff --git a/_layouts/page.html b/_layouts/page.html
deleted file mode 100644
index ef2a5cd..0000000
--- a/_layouts/page.html
+++ /dev/null
@@ -1,7 +0,0 @@
----
-theme :
-  name : custom-twitter
-layout: default
----
-{% include JB/setup %}
-{% include themes/custom-twitter/page.html %}
diff --git a/_layouts/post.html b/_layouts/post.html
deleted file mode 100644
index 7dcf6bc..0000000
--- a/_layouts/post.html
+++ /dev/null
@@ -1,7 +0,0 @@
----
-theme :
-  name : custom-twitter
-layout: default
----
-{% include JB/setup %}
-{% include themes/custom-twitter/post.html %}
diff --git a/_plugins/debug.rb b/_plugins/debug.rb
deleted file mode 100644
index e1dde39..0000000
--- a/_plugins/debug.rb
+++ /dev/null
@@ -1,38 +0,0 @@
-# A simple way to inspect liquid template variables.
-# Usage:
-#  Can be used anywhere liquid syntax is parsed (templates, includes, posts/pages)
-#  {{ site | debug }}
-#  {{ site.posts | debug }}
-#
-require 'pp'
-module Jekyll
-  # Need to overwrite the inspect method here because the original
-  # uses < > to encapsulate the psuedo post/page objects in which case
-  # the output is taken for HTML tags and hidden from view.
-  #
-  class Post
-    def inspect
-      "#Jekyll:Post @id=#{self.id.inspect}"
-    end
-  end
-  
-  class Page
-    def inspect
-      "#Jekyll:Page @name=#{self.name.inspect}"
-    end
-  end
-  
-end # Jekyll
-  
-module Jekyll
-  module DebugFilter
-    
-    def debug(obj, stdout=false)
-      puts obj.pretty_inspect if stdout
-      "<pre>#{obj.class}\n#{obj.pretty_inspect}</pre>"
-    end
-
-  end # DebugFilter
-end # Jekyll
-
-Liquid::Template.register_filter(Jekyll::DebugFilter)
\ No newline at end of file
diff --git a/_site/404.html b/_site/404.html
deleted file mode 100644
index 6904bcd..0000000
--- a/_site/404.html
+++ /dev/null
@@ -1 +0,0 @@
-Sorry this page does not exist =(
diff --git a/_site/archive.html b/_site/archive.html
deleted file mode 100644
index fff7b26..0000000
--- a/_site/archive.html
+++ /dev/null
@@ -1,149 +0,0 @@
-
-<!DOCTYPE html>
-<html lang="en">
-  <head>
-    <meta charset="utf-8">
-    <title>Blog</title>
-    
-    <meta name="author" content="cse599">
-
-    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
-    <!--[if lt IE 9]>
-      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
-    <![endif]-->
-
-    <!-- Le styles -->
-    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
-    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
-
-    <!-- Le fav and touch icons -->
-  <!-- Update these with your own images
-    <link rel="shortcut icon" href="images/favicon.ico">
-    <link rel="apple-touch-icon" href="images/apple-touch-icon.png">
-    <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png">
-    <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png">
-  -->
-  <script>
-    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-    ga('create', 'UA-75982049-1', 'auto');
-    ga('send', 'pageview');
-  </script>
-  </head>
-
-  <body>
-
-    <div class="topbar">
-      <div class="fill">
-        <div class="container">
-          <a class="brand" href="">CSE599 Deep Learning System</a>
-          <ul class="nav">
-            
-            
-            
-
-
-
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      	
-      	<li><a href="/assignments">Assignments</a></li>
-      	
-      
-      
-    
-  
-    
-      
-      	
-      	<li><a href="/schedule">Schedule</a></li>
-      	
-      
-      
-    
-  
-
-
-
-
-          </ul>
-        </div>
-      </div>
-    </div>
-
-    <div class="container">
-
-      <div class="content">
-        
-<div class="page-header">
-  <h1>Blog </h1>
-</div>
-
-<div class="row">
-  <div class="span14">
-    
-
-
-
-
-  
-
-
-
-  </div>
-</div>
-
-
-      </div>
-
-      <footer>
-      </footer>
-
-    </div> <!-- /container -->
-
-    
-
-
-
-  </body>
-</html>
-
diff --git a/_site/assets/themes/custom-twitter/bootstrap/css/bootstrap.2.2.2.min.css b/_site/assets/themes/custom-twitter/bootstrap/css/bootstrap.2.2.2.min.css
deleted file mode 100644
index 9f19ba6..0000000
--- a/_site/assets/themes/custom-twitter/bootstrap/css/bootstrap.2.2.2.min.css
+++ /dev/null
@@ -1,782 +0,0 @@
-/*!
- * Bootstrap v2.2.2
- *
- * Copyright 2012 Twitter, Inc
- * Licensed under the Apache License v2.0
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Designed and built with all the love in the world @twitter by @mdo and @fat.
- */
-.clearfix{*zoom:1;}.clearfix:before,.clearfix:after{display:table;content:"";line-height:0;}
-.clearfix:after{clear:both;}
-.hide-text{font:0/0 a;color:transparent;text-shadow:none;background-color:transparent;border:0;}
-.input-block-level{display:block;width:100%;min-height:30px;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;}
-article,aside,details,figcaption,figure,footer,header,hgroup,nav,section{display:block;}
-audio,canvas,video{display:inline-block;*display:inline;*zoom:1;}
-audio:not([controls]){display:none;}
-html{font-size:100%;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%;}
-a:focus{outline:thin dotted #333;outline:5px auto -webkit-focus-ring-color;outline-offset:-2px;}
-a:hover,a:active{outline:0;}
-sub,sup{position:relative;font-size:75%;line-height:0;vertical-align:baseline;}
-sup{top:-0.5em;}
-sub{bottom:-0.25em;}
-img{max-width:100%;width:auto\9;height:auto;vertical-align:middle;border:0;-ms-interpolation-mode:bicubic;}
-#map_canvas img,.google-maps img{max-width:none;}
-button,input,select,textarea{margin:0;font-size:100%;vertical-align:middle;}
-button,input{*overflow:visible;line-height:normal;}
-button::-moz-focus-inner,input::-moz-focus-inner{padding:0;border:0;}
-button,html input[type="button"],input[type="reset"],input[type="submit"]{-webkit-appearance:button;cursor:pointer;}
-label,select,button,input[type="button"],input[type="reset"],input[type="submit"],input[type="radio"],input[type="checkbox"]{cursor:pointer;}
-input[type="search"]{-webkit-box-sizing:content-box;-moz-box-sizing:content-box;box-sizing:content-box;-webkit-appearance:textfield;}
-input[type="search"]::-webkit-search-decoration,input[type="search"]::-webkit-search-cancel-button{-webkit-appearance:none;}
-textarea{overflow:auto;vertical-align:top;}
-@media print{*{text-shadow:none !important;color:#000 !important;background:transparent !important;box-shadow:none !important;} a,a:visited{text-decoration:underline;} a[href]:after{content:" (" attr(href) ")";} abbr[title]:after{content:" (" attr(title) ")";} .ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:"";} pre,blockquote{border:1px solid #999;page-break-inside:avoid;} thead{display:table-header-group;} tr,img{page-break-inside:avoid;} img{max-width:100% !importa [...]
-a{color:#0088cc;text-decoration:none;}
-a:hover{color:#005580;text-decoration:underline;}
-.img-rounded{-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;}
-.img-polaroid{padding:4px;background-color:#fff;border:1px solid #ccc;border:1px solid rgba(0, 0, 0, 0.2);-webkit-box-shadow:0 1px 3px rgba(0, 0, 0, 0.1);-moz-box-shadow:0 1px 3px rgba(0, 0, 0, 0.1);box-shadow:0 1px 3px rgba(0, 0, 0, 0.1);}
-.img-circle{-webkit-border-radius:500px;-moz-border-radius:500px;border-radius:500px;}
-.row{margin-left:-20px;*zoom:1;}.row:before,.row:after{display:table;content:"";line-height:0;}
-.row:after{clear:both;}
-[class*="span"]{float:left;min-height:1px;margin-left:20px;}
-.container,.navbar-static-top .container,.navbar-fixed-top .container,.navbar-fixed-bottom .container{width:940px;}
-.span12{width:940px;}
-.span11{width:860px;}
-.span10{width:780px;}
-.span9{width:700px;}
-.span8{width:620px;}
-.span7{width:540px;}
-.span6{width:460px;}
-.span5{width:380px;}
-.span4{width:300px;}
-.span3{width:220px;}
-.span2{width:140px;}
-.span1{width:60px;}
-.offset12{margin-left:980px;}
-.offset11{margin-left:900px;}
-.offset10{margin-left:820px;}
-.offset9{margin-left:740px;}
-.offset8{margin-left:660px;}
-.offset7{margin-left:580px;}
-.offset6{margin-left:500px;}
-.offset5{margin-left:420px;}
-.offset4{margin-left:340px;}
-.offset3{margin-left:260px;}
-.offset2{margin-left:180px;}
-.offset1{margin-left:100px;}
-.row-fluid{width:100%;*zoom:1;}.row-fluid:before,.row-fluid:after{display:table;content:"";line-height:0;}
-.row-fluid:after{clear:both;}
-.row-fluid [class*="span"]{display:block;width:100%;min-height:30px;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;float:left;margin-left:2.127659574468085%;*margin-left:2.074468085106383%;}
-.row-fluid [class*="span"]:first-child{margin-left:0;}
-.row-fluid .controls-row [class*="span"]+[class*="span"]{margin-left:2.127659574468085%;}
-.row-fluid .span12{width:100%;*width:99.94680851063829%;}
-.row-fluid .span11{width:91.48936170212765%;*width:91.43617021276594%;}
-.row-fluid .span10{width:82.97872340425532%;*width:82.92553191489361%;}
-.row-fluid .span9{width:74.46808510638297%;*width:74.41489361702126%;}
-.row-fluid .span8{width:65.95744680851064%;*width:65.90425531914893%;}
-.row-fluid .span7{width:57.44680851063829%;*width:57.39361702127659%;}
-.row-fluid .span6{width:48.93617021276595%;*width:48.88297872340425%;}
-.row-fluid .span5{width:40.42553191489362%;*width:40.37234042553192%;}
-.row-fluid .span4{width:31.914893617021278%;*width:31.861702127659576%;}
-.row-fluid .span3{width:23.404255319148934%;*width:23.351063829787233%;}
-.row-fluid .span2{width:14.893617021276595%;*width:14.840425531914894%;}
-.row-fluid .span1{width:6.382978723404255%;*width:6.329787234042553%;}
-.row-fluid .offset12{margin-left:104.25531914893617%;*margin-left:104.14893617021275%;}
-.row-fluid .offset12:first-child{margin-left:102.12765957446808%;*margin-left:102.02127659574467%;}
-.row-fluid .offset11{margin-left:95.74468085106382%;*margin-left:95.6382978723404%;}
-.row-fluid .offset11:first-child{margin-left:93.61702127659574%;*margin-left:93.51063829787232%;}
-.row-fluid .offset10{margin-left:87.23404255319149%;*margin-left:87.12765957446807%;}
-.row-fluid .offset10:first-child{margin-left:85.1063829787234%;*margin-left:84.99999999999999%;}
-.row-fluid .offset9{margin-left:78.72340425531914%;*margin-left:78.61702127659572%;}
-.row-fluid .offset9:first-child{margin-left:76.59574468085106%;*margin-left:76.48936170212764%;}
-.row-fluid .offset8{margin-left:70.2127659574468%;*margin-left:70.10638297872339%;}
-.row-fluid .offset8:first-child{margin-left:68.08510638297872%;*margin-left:67.9787234042553%;}
-.row-fluid .offset7{margin-left:61.70212765957446%;*margin-left:61.59574468085106%;}
-.row-fluid .offset7:first-child{margin-left:59.574468085106375%;*margin-left:59.46808510638297%;}
-.row-fluid .offset6{margin-left:53.191489361702125%;*margin-left:53.085106382978715%;}
-.row-fluid .offset6:first-child{margin-left:51.063829787234035%;*margin-left:50.95744680851063%;}
-.row-fluid .offset5{margin-left:44.68085106382979%;*margin-left:44.57446808510638%;}
-.row-fluid .offset5:first-child{margin-left:42.5531914893617%;*margin-left:42.4468085106383%;}
-.row-fluid .offset4{margin-left:36.170212765957444%;*margin-left:36.06382978723405%;}
-.row-fluid .offset4:first-child{margin-left:34.04255319148936%;*margin-left:33.93617021276596%;}
-.row-fluid .offset3{margin-left:27.659574468085104%;*margin-left:27.5531914893617%;}
-.row-fluid .offset3:first-child{margin-left:25.53191489361702%;*margin-left:25.425531914893618%;}
-.row-fluid .offset2{margin-left:19.148936170212764%;*margin-left:19.04255319148936%;}
-.row-fluid .offset2:first-child{margin-left:17.02127659574468%;*margin-left:16.914893617021278%;}
-.row-fluid .offset1{margin-left:10.638297872340425%;*margin-left:10.53191489361702%;}
-.row-fluid .offset1:first-child{margin-left:8.51063829787234%;*margin-left:8.404255319148938%;}
-[class*="span"].hide,.row-fluid [class*="span"].hide{display:none;}
-[class*="span"].pull-right,.row-fluid [class*="span"].pull-right{float:right;}
-.container{margin-right:auto;margin-left:auto;*zoom:1;}.container:before,.container:after{display:table;content:"";line-height:0;}
-.container:after{clear:both;}
-.container-fluid{padding-right:20px;padding-left:20px;*zoom:1;}.container-fluid:before,.container-fluid:after{display:table;content:"";line-height:0;}
-.container-fluid:after{clear:both;}
-p{margin:0 0 10px;}
-.lead{margin-bottom:20px;font-size:21px;font-weight:200;line-height:30px;}
-small{font-size:85%;}
-strong{font-weight:bold;}
-em{font-style:italic;}
-cite{font-style:normal;}
-.muted{color:#999999;}
-a.muted:hover{color:#808080;}
-.text-warning{color:#c09853;}
-a.text-warning:hover{color:#a47e3c;}
-.text-error{color:#b94a48;}
-a.text-error:hover{color:#953b39;}
-.text-info{color:#3a87ad;}
-a.text-info:hover{color:#2d6987;}
-.text-success{color:#468847;}
-a.text-success:hover{color:#356635;}
-h1,h2,h3,h4,h5,h6{margin:10px 0;font-family:inherit;font-weight:bold;line-height:20px;color:inherit;text-rendering:optimizelegibility;}h1 small,h2 small,h3 small,h4 small,h5 small,h6 small{font-weight:normal;line-height:1;color:#999999;}
-h1,h2,h3{line-height:40px;}
-h1{font-size:38.5px;}
-h2{font-size:31.5px;}
-h3{font-size:24.5px;}
-h4{font-size:17.5px;}
-h5{font-size:14px;}
-h6{font-size:11.9px;}
-h1 small{font-size:24.5px;}
-h2 small{font-size:17.5px;}
-h3 small{font-size:14px;}
-h4 small{font-size:14px;}
-.page-header{padding-bottom:9px;margin:20px 0 30px;border-bottom:1px solid #eeeeee;}
-ul,ol{padding:0;margin:0 0 10px 25px;}
-ul ul,ul ol,ol ol,ol ul{margin-bottom:0;}
-li{line-height:20px;}
-ul.unstyled,ol.unstyled{margin-left:0;list-style:none;}
-ul.inline,ol.inline{margin-left:0;list-style:none;}ul.inline >li,ol.inline >li{display:inline-block;padding-left:5px;padding-right:5px;}
-dl{margin-bottom:20px;}
-dt,dd{line-height:20px;}
-dt{font-weight:bold;}
-dd{margin-left:10px;}
-.dl-horizontal{*zoom:1;}.dl-horizontal:before,.dl-horizontal:after{display:table;content:"";line-height:0;}
-.dl-horizontal:after{clear:both;}
-.dl-horizontal dt{float:left;width:160px;clear:left;text-align:right;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;}
-.dl-horizontal dd{margin-left:180px;}
-hr{margin:20px 0;border:0;border-top:1px solid #eeeeee;border-bottom:1px solid #ffffff;}
-abbr[title],abbr[data-original-title]{cursor:help;border-bottom:1px dotted #999999;}
-abbr.initialism{font-size:90%;text-transform:uppercase;}
-blockquote{padding:0 0 0 15px;margin:0 0 20px;border-left:5px solid #eeeeee;}blockquote p{margin-bottom:0;font-size:16px;font-weight:300;line-height:25px;}
-blockquote small{display:block;line-height:20px;color:#999999;}blockquote small:before{content:'\2014 \00A0';}
-blockquote.pull-right{float:right;padding-right:15px;padding-left:0;border-right:5px solid #eeeeee;border-left:0;}blockquote.pull-right p,blockquote.pull-right small{text-align:right;}
-blockquote.pull-right small:before{content:'';}
-blockquote.pull-right small:after{content:'\00A0 \2014';}
-q:before,q:after,blockquote:before,blockquote:after{content:"";}
-address{display:block;margin-bottom:20px;font-style:normal;line-height:20px;}
-code,pre{padding:0 3px 2px;font-family:Monaco,Menlo,Consolas,"Courier New",monospace;font-size:12px;color:#333333;-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}
-code{padding:2px 4px;color:#d14;background-color:#f7f7f9;border:1px solid #e1e1e8;white-space:nowrap;}
-pre{display:block;padding:9.5px;margin:0 0 10px;font-size:13px;line-height:20px;word-break:break-all;word-wrap:break-word;white-space:pre;white-space:pre-wrap;background-color:#f5f5f5;border:1px solid #ccc;border:1px solid rgba(0, 0, 0, 0.15);-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}pre.prettyprint{margin-bottom:20px;}
-pre code{padding:0;color:inherit;white-space:pre;white-space:pre-wrap;background-color:transparent;border:0;}
-.pre-scrollable{max-height:340px;overflow-y:scroll;}
-.label,.badge{display:inline-block;padding:2px 4px;font-size:11.844px;font-weight:bold;line-height:14px;color:#ffffff;vertical-align:baseline;white-space:nowrap;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#999999;}
-.label{-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}
-.badge{padding-left:9px;padding-right:9px;-webkit-border-radius:9px;-moz-border-radius:9px;border-radius:9px;}
-.label:empty,.badge:empty{display:none;}
-a.label:hover,a.badge:hover{color:#ffffff;text-decoration:none;cursor:pointer;}
-.label-important,.badge-important{background-color:#b94a48;}
-.label-important[href],.badge-important[href]{background-color:#953b39;}
-.label-warning,.badge-warning{background-color:#f89406;}
-.label-warning[href],.badge-warning[href]{background-color:#c67605;}
-.label-success,.badge-success{background-color:#468847;}
-.label-success[href],.badge-success[href]{background-color:#356635;}
-.label-info,.badge-info{background-color:#3a87ad;}
-.label-info[href],.badge-info[href]{background-color:#2d6987;}
-.label-inverse,.badge-inverse{background-color:#333333;}
-.label-inverse[href],.badge-inverse[href]{background-color:#1a1a1a;}
-.btn .label,.btn .badge{position:relative;top:-1px;}
-.btn-mini .label,.btn-mini .badge{top:0;}
-table{max-width:100%;background-color:transparent;border-collapse:collapse;border-spacing:0;}
-.table{width:100%;margin-bottom:20px;}.table th,.table td{padding:8px;line-height:20px;text-align:left;vertical-align:top;border-top:1px solid #dddddd;}
-.table th{font-weight:bold;}
-.table thead th{vertical-align:bottom;}
-.table caption+thead tr:first-child th,.table caption+thead tr:first-child td,.table colgroup+thead tr:first-child th,.table colgroup+thead tr:first-child td,.table thead:first-child tr:first-child th,.table thead:first-child tr:first-child td{border-top:0;}
-.table tbody+tbody{border-top:2px solid #dddddd;}
-.table .table{background-color:#ffffff;}
-.table-condensed th,.table-condensed td{padding:4px 5px;}
-.table-bordered{border:1px solid #dddddd;border-collapse:separate;*border-collapse:collapse;border-left:0;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}.table-bordered th,.table-bordered td{border-left:1px solid #dddddd;}
-.table-bordered caption+thead tr:first-child th,.table-bordered caption+tbody tr:first-child th,.table-bordered caption+tbody tr:first-child td,.table-bordered colgroup+thead tr:first-child th,.table-bordered colgroup+tbody tr:first-child th,.table-bordered colgroup+tbody tr:first-child td,.table-bordered thead:first-child tr:first-child th,.table-bordered tbody:first-child tr:first-child th,.table-bordered tbody:first-child tr:first-child td{border-top:0;}
-.table-bordered thead:first-child tr:first-child>th:first-child,.table-bordered tbody:first-child tr:first-child>td:first-child{-webkit-border-top-left-radius:4px;-moz-border-radius-topleft:4px;border-top-left-radius:4px;}
-.table-bordered thead:first-child tr:first-child>th:last-child,.table-bordered tbody:first-child tr:first-child>td:last-child{-webkit-border-top-right-radius:4px;-moz-border-radius-topright:4px;border-top-right-radius:4px;}
-.table-bordered thead:last-child tr:last-child>th:first-child,.table-bordered tbody:last-child tr:last-child>td:first-child,.table-bordered tfoot:last-child tr:last-child>td:first-child{-webkit-border-bottom-left-radius:4px;-moz-border-radius-bottomleft:4px;border-bottom-left-radius:4px;}
-.table-bordered thead:last-child tr:last-child>th:last-child,.table-bordered tbody:last-child tr:last-child>td:last-child,.table-bordered tfoot:last-child tr:last-child>td:last-child{-webkit-border-bottom-right-radius:4px;-moz-border-radius-bottomright:4px;border-bottom-right-radius:4px;}
-.table-bordered tfoot+tbody:last-child tr:last-child td:first-child{-webkit-border-bottom-left-radius:0;-moz-border-radius-bottomleft:0;border-bottom-left-radius:0;}
-.table-bordered tfoot+tbody:last-child tr:last-child td:last-child{-webkit-border-bottom-right-radius:0;-moz-border-radius-bottomright:0;border-bottom-right-radius:0;}
-.table-bordered caption+thead tr:first-child th:first-child,.table-bordered caption+tbody tr:first-child td:first-child,.table-bordered colgroup+thead tr:first-child th:first-child,.table-bordered colgroup+tbody tr:first-child td:first-child{-webkit-border-top-left-radius:4px;-moz-border-radius-topleft:4px;border-top-left-radius:4px;}
-.table-bordered caption+thead tr:first-child th:last-child,.table-bordered caption+tbody tr:first-child td:last-child,.table-bordered colgroup+thead tr:first-child th:last-child,.table-bordered colgroup+tbody tr:first-child td:last-child{-webkit-border-top-right-radius:4px;-moz-border-radius-topright:4px;border-top-right-radius:4px;}
-.table-striped tbody>tr:nth-child(odd)>td,.table-striped tbody>tr:nth-child(odd)>th{background-color:#f9f9f9;}
-.table-hover tbody tr:hover td,.table-hover tbody tr:hover th{background-color:#f5f5f5;}
-table td[class*="span"],table th[class*="span"],.row-fluid table td[class*="span"],.row-fluid table th[class*="span"]{display:table-cell;float:none;margin-left:0;}
-.table td.span1,.table th.span1{float:none;width:44px;margin-left:0;}
-.table td.span2,.table th.span2{float:none;width:124px;margin-left:0;}
-.table td.span3,.table th.span3{float:none;width:204px;margin-left:0;}
-.table td.span4,.table th.span4{float:none;width:284px;margin-left:0;}
-.table td.span5,.table th.span5{float:none;width:364px;margin-left:0;}
-.table td.span6,.table th.span6{float:none;width:444px;margin-left:0;}
-.table td.span7,.table th.span7{float:none;width:524px;margin-left:0;}
-.table td.span8,.table th.span8{float:none;width:604px;margin-left:0;}
-.table td.span9,.table th.span9{float:none;width:684px;margin-left:0;}
-.table td.span10,.table th.span10{float:none;width:764px;margin-left:0;}
-.table td.span11,.table th.span11{float:none;width:844px;margin-left:0;}
-.table td.span12,.table th.span12{float:none;width:924px;margin-left:0;}
-.table tbody tr.success td{background-color:#dff0d8;}
-.table tbody tr.error td{background-color:#f2dede;}
-.table tbody tr.warning td{background-color:#fcf8e3;}
-.table tbody tr.info td{background-color:#d9edf7;}
-.table-hover tbody tr.success:hover td{background-color:#d0e9c6;}
-.table-hover tbody tr.error:hover td{background-color:#ebcccc;}
-.table-hover tbody tr.warning:hover td{background-color:#faf2cc;}
-.table-hover tbody tr.info:hover td{background-color:#c4e3f3;}
-form{margin:0 0 20px;}
-fieldset{padding:0;margin:0;border:0;}
-legend{display:block;width:100%;padding:0;margin-bottom:20px;font-size:21px;line-height:40px;color:#333333;border:0;border-bottom:1px solid #e5e5e5;}legend small{font-size:15px;color:#999999;}
-label,input,button,select,textarea{font-size:14px;font-weight:normal;line-height:20px;}
-input,button,select,textarea{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;}
-label{display:block;margin-bottom:5px;}
-select,textarea,input[type="text"],input[type="password"],input[type="datetime"],input[type="datetime-local"],input[type="date"],input[type="month"],input[type="time"],input[type="week"],input[type="number"],input[type="email"],input[type="url"],input[type="search"],input[type="tel"],input[type="color"],.uneditable-input{display:inline-block;height:20px;padding:4px 6px;margin-bottom:10px;font-size:14px;line-height:20px;color:#555555;-webkit-border-radius:4px;-moz-border-radius:4px;border [...]
-input,textarea,.uneditable-input{width:206px;}
-textarea{height:auto;}
-textarea,input[type="text"],input[type="password"],input[type="datetime"],input[type="datetime-local"],input[type="date"],input[type="month"],input[type="time"],input[type="week"],input[type="number"],input[type="email"],input[type="url"],input[type="search"],input[type="tel"],input[type="color"],.uneditable-input{background-color:#ffffff;border:1px solid #cccccc;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);-moz-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);box-shadow:inset  [...]
-input[type="radio"],input[type="checkbox"]{margin:4px 0 0;*margin-top:0;margin-top:1px \9;line-height:normal;}
-input[type="file"],input[type="image"],input[type="submit"],input[type="reset"],input[type="button"],input[type="radio"],input[type="checkbox"]{width:auto;}
-select,input[type="file"]{height:30px;*margin-top:4px;line-height:30px;}
-select{width:220px;border:1px solid #cccccc;background-color:#ffffff;}
-select[multiple],select[size]{height:auto;}
-select:focus,input[type="file"]:focus,input[type="radio"]:focus,input[type="checkbox"]:focus{outline:thin dotted #333;outline:5px auto -webkit-focus-ring-color;outline-offset:-2px;}
-.uneditable-input,.uneditable-textarea{color:#999999;background-color:#fcfcfc;border-color:#cccccc;-webkit-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.025);-moz-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.025);box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.025);cursor:not-allowed;}
-.uneditable-input{overflow:hidden;white-space:nowrap;}
-.uneditable-textarea{width:auto;height:auto;}
-input:-moz-placeholder,textarea:-moz-placeholder{color:#999999;}
-input:-ms-input-placeholder,textarea:-ms-input-placeholder{color:#999999;}
-input::-webkit-input-placeholder,textarea::-webkit-input-placeholder{color:#999999;}
-.radio,.checkbox{min-height:20px;padding-left:20px;}
-.radio input[type="radio"],.checkbox input[type="checkbox"]{float:left;margin-left:-20px;}
-.controls>.radio:first-child,.controls>.checkbox:first-child{padding-top:5px;}
-.radio.inline,.checkbox.inline{display:inline-block;padding-top:5px;margin-bottom:0;vertical-align:middle;}
-.radio.inline+.radio.inline,.checkbox.inline+.checkbox.inline{margin-left:10px;}
-.input-mini{width:60px;}
-.input-small{width:90px;}
-.input-medium{width:150px;}
-.input-large{width:210px;}
-.input-xlarge{width:270px;}
-.input-xxlarge{width:530px;}
-input[class*="span"],select[class*="span"],textarea[class*="span"],.uneditable-input[class*="span"],.row-fluid input[class*="span"],.row-fluid select[class*="span"],.row-fluid textarea[class*="span"],.row-fluid .uneditable-input[class*="span"]{float:none;margin-left:0;}
-.input-append input[class*="span"],.input-append .uneditable-input[class*="span"],.input-prepend input[class*="span"],.input-prepend .uneditable-input[class*="span"],.row-fluid input[class*="span"],.row-fluid select[class*="span"],.row-fluid textarea[class*="span"],.row-fluid .uneditable-input[class*="span"],.row-fluid .input-prepend [class*="span"],.row-fluid .input-append [class*="span"]{display:inline-block;}
-input,textarea,.uneditable-input{margin-left:0;}
-.controls-row [class*="span"]+[class*="span"]{margin-left:20px;}
-input.span12, textarea.span12, .uneditable-input.span12{width:926px;}
-input.span11, textarea.span11, .uneditable-input.span11{width:846px;}
-input.span10, textarea.span10, .uneditable-input.span10{width:766px;}
-input.span9, textarea.span9, .uneditable-input.span9{width:686px;}
-input.span8, textarea.span8, .uneditable-input.span8{width:606px;}
-input.span7, textarea.span7, .uneditable-input.span7{width:526px;}
-input.span6, textarea.span6, .uneditable-input.span6{width:446px;}
-input.span5, textarea.span5, .uneditable-input.span5{width:366px;}
-input.span4, textarea.span4, .uneditable-input.span4{width:286px;}
-input.span3, textarea.span3, .uneditable-input.span3{width:206px;}
-input.span2, textarea.span2, .uneditable-input.span2{width:126px;}
-input.span1, textarea.span1, .uneditable-input.span1{width:46px;}
-.controls-row{*zoom:1;}.controls-row:before,.controls-row:after{display:table;content:"";line-height:0;}
-.controls-row:after{clear:both;}
-.controls-row [class*="span"],.row-fluid .controls-row [class*="span"]{float:left;}
-.controls-row .checkbox[class*="span"],.controls-row .radio[class*="span"]{padding-top:5px;}
-input[disabled],select[disabled],textarea[disabled],input[readonly],select[readonly],textarea[readonly]{cursor:not-allowed;background-color:#eeeeee;}
-input[type="radio"][disabled],input[type="checkbox"][disabled],input[type="radio"][readonly],input[type="checkbox"][readonly]{background-color:transparent;}
-.control-group.warning .control-label,.control-group.warning .help-block,.control-group.warning .help-inline{color:#c09853;}
-.control-group.warning .checkbox,.control-group.warning .radio,.control-group.warning input,.control-group.warning select,.control-group.warning textarea{color:#c09853;}
-.control-group.warning input,.control-group.warning select,.control-group.warning textarea{border-color:#c09853;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);-moz-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);}.control-group.warning input:focus,.control-group.warning select:focus,.control-group.warning textarea:focus{border-color:#a47e3c;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075),0 0 6px #dbc59e;-moz-box-shadow:inse [...]
-.control-group.warning .input-prepend .add-on,.control-group.warning .input-append .add-on{color:#c09853;background-color:#fcf8e3;border-color:#c09853;}
-.control-group.error .control-label,.control-group.error .help-block,.control-group.error .help-inline{color:#b94a48;}
-.control-group.error .checkbox,.control-group.error .radio,.control-group.error input,.control-group.error select,.control-group.error textarea{color:#b94a48;}
-.control-group.error input,.control-group.error select,.control-group.error textarea{border-color:#b94a48;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);-moz-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);}.control-group.error input:focus,.control-group.error select:focus,.control-group.error textarea:focus{border-color:#953b39;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075),0 0 6px #d59392;-moz-box-shadow:inset 0 1px 1px  [...]
-.control-group.error .input-prepend .add-on,.control-group.error .input-append .add-on{color:#b94a48;background-color:#f2dede;border-color:#b94a48;}
-.control-group.success .control-label,.control-group.success .help-block,.control-group.success .help-inline{color:#468847;}
-.control-group.success .checkbox,.control-group.success .radio,.control-group.success input,.control-group.success select,.control-group.success textarea{color:#468847;}
-.control-group.success input,.control-group.success select,.control-group.success textarea{border-color:#468847;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);-moz-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);}.control-group.success input:focus,.control-group.success select:focus,.control-group.success textarea:focus{border-color:#356635;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075),0 0 6px #7aba7b;-moz-box-shadow:inse [...]
-.control-group.success .input-prepend .add-on,.control-group.success .input-append .add-on{color:#468847;background-color:#dff0d8;border-color:#468847;}
-.control-group.info .control-label,.control-group.info .help-block,.control-group.info .help-inline{color:#3a87ad;}
-.control-group.info .checkbox,.control-group.info .radio,.control-group.info input,.control-group.info select,.control-group.info textarea{color:#3a87ad;}
-.control-group.info input,.control-group.info select,.control-group.info textarea{border-color:#3a87ad;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);-moz-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075);}.control-group.info input:focus,.control-group.info select:focus,.control-group.info textarea:focus{border-color:#2d6987;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.075),0 0 6px #7ab5d3;-moz-box-shadow:inset 0 1px 1px rgba(0 [...]
-.control-group.info .input-prepend .add-on,.control-group.info .input-append .add-on{color:#3a87ad;background-color:#d9edf7;border-color:#3a87ad;}
-input:focus:invalid,textarea:focus:invalid,select:focus:invalid{color:#b94a48;border-color:#ee5f5b;}input:focus:invalid:focus,textarea:focus:invalid:focus,select:focus:invalid:focus{border-color:#e9322d;-webkit-box-shadow:0 0 6px #f8b9b7;-moz-box-shadow:0 0 6px #f8b9b7;box-shadow:0 0 6px #f8b9b7;}
-.form-actions{padding:19px 20px 20px;margin-top:20px;margin-bottom:20px;background-color:#f5f5f5;border-top:1px solid #e5e5e5;*zoom:1;}.form-actions:before,.form-actions:after{display:table;content:"";line-height:0;}
-.form-actions:after{clear:both;}
-.help-block,.help-inline{color:#595959;}
-.help-block{display:block;margin-bottom:10px;}
-.help-inline{display:inline-block;*display:inline;*zoom:1;vertical-align:middle;padding-left:5px;}
-.input-append,.input-prepend{margin-bottom:5px;font-size:0;white-space:nowrap;}.input-append input,.input-prepend input,.input-append select,.input-prepend select,.input-append .uneditable-input,.input-prepend .uneditable-input,.input-append .dropdown-menu,.input-prepend .dropdown-menu{font-size:14px;}
-.input-append input,.input-prepend input,.input-append select,.input-prepend select,.input-append .uneditable-input,.input-prepend .uneditable-input{position:relative;margin-bottom:0;*margin-left:0;vertical-align:top;-webkit-border-radius:0 4px 4px 0;-moz-border-radius:0 4px 4px 0;border-radius:0 4px 4px 0;}.input-append input:focus,.input-prepend input:focus,.input-append select:focus,.input-prepend select:focus,.input-append .uneditable-input:focus,.input-prepend .uneditable-input:focu [...]
-.input-append .add-on,.input-prepend .add-on{display:inline-block;width:auto;height:20px;min-width:16px;padding:4px 5px;font-size:14px;font-weight:normal;line-height:20px;text-align:center;text-shadow:0 1px 0 #ffffff;background-color:#eeeeee;border:1px solid #ccc;}
-.input-append .add-on,.input-prepend .add-on,.input-append .btn,.input-prepend .btn,.input-append .btn-group>.dropdown-toggle,.input-prepend .btn-group>.dropdown-toggle{vertical-align:top;-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.input-append .active,.input-prepend .active{background-color:#a9dba9;border-color:#46a546;}
-.input-prepend .add-on,.input-prepend .btn{margin-right:-1px;}
-.input-prepend .add-on:first-child,.input-prepend .btn:first-child{-webkit-border-radius:4px 0 0 4px;-moz-border-radius:4px 0 0 4px;border-radius:4px 0 0 4px;}
-.input-append input,.input-append select,.input-append .uneditable-input{-webkit-border-radius:4px 0 0 4px;-moz-border-radius:4px 0 0 4px;border-radius:4px 0 0 4px;}.input-append input+.btn-group .btn:last-child,.input-append select+.btn-group .btn:last-child,.input-append .uneditable-input+.btn-group .btn:last-child{-webkit-border-radius:0 4px 4px 0;-moz-border-radius:0 4px 4px 0;border-radius:0 4px 4px 0;}
-.input-append .add-on,.input-append .btn,.input-append .btn-group{margin-left:-1px;}
-.input-append .add-on:last-child,.input-append .btn:last-child,.input-append .btn-group:last-child>.dropdown-toggle{-webkit-border-radius:0 4px 4px 0;-moz-border-radius:0 4px 4px 0;border-radius:0 4px 4px 0;}
-.input-prepend.input-append input,.input-prepend.input-append select,.input-prepend.input-append .uneditable-input{-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}.input-prepend.input-append input+.btn-group .btn,.input-prepend.input-append select+.btn-group .btn,.input-prepend.input-append .uneditable-input+.btn-group .btn{-webkit-border-radius:0 4px 4px 0;-moz-border-radius:0 4px 4px 0;border-radius:0 4px 4px 0;}
-.input-prepend.input-append .add-on:first-child,.input-prepend.input-append .btn:first-child{margin-right:-1px;-webkit-border-radius:4px 0 0 4px;-moz-border-radius:4px 0 0 4px;border-radius:4px 0 0 4px;}
-.input-prepend.input-append .add-on:last-child,.input-prepend.input-append .btn:last-child{margin-left:-1px;-webkit-border-radius:0 4px 4px 0;-moz-border-radius:0 4px 4px 0;border-radius:0 4px 4px 0;}
-.input-prepend.input-append .btn-group:first-child{margin-left:0;}
-input.search-query{padding-right:14px;padding-right:4px \9;padding-left:14px;padding-left:4px \9;margin-bottom:0;-webkit-border-radius:15px;-moz-border-radius:15px;border-radius:15px;}
-.form-search .input-append .search-query,.form-search .input-prepend .search-query{-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.form-search .input-append .search-query{-webkit-border-radius:14px 0 0 14px;-moz-border-radius:14px 0 0 14px;border-radius:14px 0 0 14px;}
-.form-search .input-append .btn{-webkit-border-radius:0 14px 14px 0;-moz-border-radius:0 14px 14px 0;border-radius:0 14px 14px 0;}
-.form-search .input-prepend .search-query{-webkit-border-radius:0 14px 14px 0;-moz-border-radius:0 14px 14px 0;border-radius:0 14px 14px 0;}
-.form-search .input-prepend .btn{-webkit-border-radius:14px 0 0 14px;-moz-border-radius:14px 0 0 14px;border-radius:14px 0 0 14px;}
-.form-search input,.form-inline input,.form-horizontal input,.form-search textarea,.form-inline textarea,.form-horizontal textarea,.form-search select,.form-inline select,.form-horizontal select,.form-search .help-inline,.form-inline .help-inline,.form-horizontal .help-inline,.form-search .uneditable-input,.form-inline .uneditable-input,.form-horizontal .uneditable-input,.form-search .input-prepend,.form-inline .input-prepend,.form-horizontal .input-prepend,.form-search .input-append,.fo [...]
-.form-search .hide,.form-inline .hide,.form-horizontal .hide{display:none;}
-.form-search label,.form-inline label,.form-search .btn-group,.form-inline .btn-group{display:inline-block;}
-.form-search .input-append,.form-inline .input-append,.form-search .input-prepend,.form-inline .input-prepend{margin-bottom:0;}
-.form-search .radio,.form-search .checkbox,.form-inline .radio,.form-inline .checkbox{padding-left:0;margin-bottom:0;vertical-align:middle;}
-.form-search .radio input[type="radio"],.form-search .checkbox input[type="checkbox"],.form-inline .radio input[type="radio"],.form-inline .checkbox input[type="checkbox"]{float:left;margin-right:3px;margin-left:0;}
-.control-group{margin-bottom:10px;}
-legend+.control-group{margin-top:20px;-webkit-margin-top-collapse:separate;}
-.form-horizontal .control-group{margin-bottom:20px;*zoom:1;}.form-horizontal .control-group:before,.form-horizontal .control-group:after{display:table;content:"";line-height:0;}
-.form-horizontal .control-group:after{clear:both;}
-.form-horizontal .control-label{float:left;width:160px;padding-top:5px;text-align:right;}
-.form-horizontal .controls{*display:inline-block;*padding-left:20px;margin-left:180px;*margin-left:0;}.form-horizontal .controls:first-child{*padding-left:180px;}
-.form-horizontal .help-block{margin-bottom:0;}
-.form-horizontal input+.help-block,.form-horizontal select+.help-block,.form-horizontal textarea+.help-block,.form-horizontal .uneditable-input+.help-block,.form-horizontal .input-prepend+.help-block,.form-horizontal .input-append+.help-block{margin-top:10px;}
-.form-horizontal .form-actions{padding-left:180px;}
-.btn{display:inline-block;*display:inline;*zoom:1;padding:4px 12px;margin-bottom:0;font-size:14px;line-height:20px;text-align:center;vertical-align:middle;cursor:pointer;color:#333333;text-shadow:0 1px 1px rgba(255, 255, 255, 0.75);background-color:#f5f5f5;background-image:-moz-linear-gradient(top, #ffffff, #e6e6e6);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#ffffff), to(#e6e6e6));background-image:-webkit-linear-gradient(top, #ffffff, #e6e6e6);background-image:-o-linear- [...]
-.btn:active,.btn.active{background-color:#cccccc \9;}
-.btn:first-child{*margin-left:0;}
-.btn:hover{color:#333333;text-decoration:none;background-position:0 -15px;-webkit-transition:background-position 0.1s linear;-moz-transition:background-position 0.1s linear;-o-transition:background-position 0.1s linear;transition:background-position 0.1s linear;}
-.btn:focus{outline:thin dotted #333;outline:5px auto -webkit-focus-ring-color;outline-offset:-2px;}
-.btn.active,.btn:active{background-image:none;outline:0;-webkit-box-shadow:inset 0 2px 4px rgba(0,0,0,.15), 0 1px 2px rgba(0,0,0,.05);-moz-box-shadow:inset 0 2px 4px rgba(0,0,0,.15), 0 1px 2px rgba(0,0,0,.05);box-shadow:inset 0 2px 4px rgba(0,0,0,.15), 0 1px 2px rgba(0,0,0,.05);}
-.btn.disabled,.btn[disabled]{cursor:default;background-image:none;opacity:0.65;filter:alpha(opacity=65);-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;}
-.btn-large{padding:11px 19px;font-size:17.5px;-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;}
-.btn-large [class^="icon-"],.btn-large [class*=" icon-"]{margin-top:4px;}
-.btn-small{padding:2px 10px;font-size:11.9px;-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}
-.btn-small [class^="icon-"],.btn-small [class*=" icon-"]{margin-top:0;}
-.btn-mini [class^="icon-"],.btn-mini [class*=" icon-"]{margin-top:-1px;}
-.btn-mini{padding:0 6px;font-size:10.5px;-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}
-.btn-block{display:block;width:100%;padding-left:0;padding-right:0;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;}
-.btn-block+.btn-block{margin-top:5px;}
-input[type="submit"].btn-block,input[type="reset"].btn-block,input[type="button"].btn-block{width:100%;}
-.btn-primary.active,.btn-warning.active,.btn-danger.active,.btn-success.active,.btn-info.active,.btn-inverse.active{color:rgba(255, 255, 255, 0.75);}
-.btn{border-color:#c5c5c5;border-color:rgba(0, 0, 0, 0.15) rgba(0, 0, 0, 0.15) rgba(0, 0, 0, 0.25);}
-.btn-primary{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#006dcc;background-image:-moz-linear-gradient(top, #0088cc, #0044cc);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#0088cc), to(#0044cc));background-image:-webkit-linear-gradient(top, #0088cc, #0044cc);background-image:-o-linear-gradient(top, #0088cc, #0044cc);background-image:linear-gradient(to bottom, #0088cc, #0044cc);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gr [...]
-.btn-primary:active,.btn-primary.active{background-color:#003399 \9;}
-.btn-warning{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#faa732;background-image:-moz-linear-gradient(top, #fbb450, #f89406);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#fbb450), to(#f89406));background-image:-webkit-linear-gradient(top, #fbb450, #f89406);background-image:-o-linear-gradient(top, #fbb450, #f89406);background-image:linear-gradient(to bottom, #fbb450, #f89406);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gr [...]
-.btn-warning:active,.btn-warning.active{background-color:#c67605 \9;}
-.btn-danger{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#da4f49;background-image:-moz-linear-gradient(top, #ee5f5b, #bd362f);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#ee5f5b), to(#bd362f));background-image:-webkit-linear-gradient(top, #ee5f5b, #bd362f);background-image:-o-linear-gradient(top, #ee5f5b, #bd362f);background-image:linear-gradient(to bottom, #ee5f5b, #bd362f);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gra [...]
-.btn-danger:active,.btn-danger.active{background-color:#942a25 \9;}
-.btn-success{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#5bb75b;background-image:-moz-linear-gradient(top, #62c462, #51a351);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#62c462), to(#51a351));background-image:-webkit-linear-gradient(top, #62c462, #51a351);background-image:-o-linear-gradient(top, #62c462, #51a351);background-image:linear-gradient(to bottom, #62c462, #51a351);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gr [...]
-.btn-success:active,.btn-success.active{background-color:#408140 \9;}
-.btn-info{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#49afcd;background-image:-moz-linear-gradient(top, #5bc0de, #2f96b4);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#5bc0de), to(#2f96b4));background-image:-webkit-linear-gradient(top, #5bc0de, #2f96b4);background-image:-o-linear-gradient(top, #5bc0de, #2f96b4);background-image:linear-gradient(to bottom, #5bc0de, #2f96b4);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradi [...]
-.btn-info:active,.btn-info.active{background-color:#24748c \9;}
-.btn-inverse{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#363636;background-image:-moz-linear-gradient(top, #444444, #222222);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#444444), to(#222222));background-image:-webkit-linear-gradient(top, #444444, #222222);background-image:-o-linear-gradient(top, #444444, #222222);background-image:linear-gradient(to bottom, #444444, #222222);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gr [...]
-.btn-inverse:active,.btn-inverse.active{background-color:#080808 \9;}
-button.btn,input[type="submit"].btn{*padding-top:3px;*padding-bottom:3px;}button.btn::-moz-focus-inner,input[type="submit"].btn::-moz-focus-inner{padding:0;border:0;}
-button.btn.btn-large,input[type="submit"].btn.btn-large{*padding-top:7px;*padding-bottom:7px;}
-button.btn.btn-small,input[type="submit"].btn.btn-small{*padding-top:3px;*padding-bottom:3px;}
-button.btn.btn-mini,input[type="submit"].btn.btn-mini{*padding-top:1px;*padding-bottom:1px;}
-.btn-link,.btn-link:active,.btn-link[disabled]{background-color:transparent;background-image:none;-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;}
-.btn-link{border-color:transparent;cursor:pointer;color:#0088cc;-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.btn-link:hover{color:#005580;text-decoration:underline;background-color:transparent;}
-.btn-link[disabled]:hover{color:#333333;text-decoration:none;}
-[class^="icon-"],[class*=" icon-"]{display:inline-block;width:14px;height:14px;*margin-right:.3em;line-height:14px;vertical-align:text-top;background-image:url("../img/glyphicons-halflings.png");background-position:14px 14px;background-repeat:no-repeat;margin-top:1px;}
-.icon-white,.nav-pills>.active>a>[class^="icon-"],.nav-pills>.active>a>[class*=" icon-"],.nav-list>.active>a>[class^="icon-"],.nav-list>.active>a>[class*=" icon-"],.navbar-inverse .nav>.active>a>[class^="icon-"],.navbar-inverse .nav>.active>a>[class*=" icon-"],.dropdown-menu>li>a:hover>[class^="icon-"],.dropdown-menu>li>a:hover>[class*=" icon-"],.dropdown-menu>.active>a>[class^="icon-"],.dropdown-menu>.active>a>[class*=" icon-"],.dropdown-submenu:hover>a>[class^="icon-"],.dropdown-submen [...]
-.icon-glass{background-position:0 0;}
-.icon-music{background-position:-24px 0;}
-.icon-search{background-position:-48px 0;}
-.icon-envelope{background-position:-72px 0;}
-.icon-heart{background-position:-96px 0;}
-.icon-star{background-position:-120px 0;}
-.icon-star-empty{background-position:-144px 0;}
-.icon-user{background-position:-168px 0;}
-.icon-film{background-position:-192px 0;}
-.icon-th-large{background-position:-216px 0;}
-.icon-th{background-position:-240px 0;}
-.icon-th-list{background-position:-264px 0;}
-.icon-ok{background-position:-288px 0;}
-.icon-remove{background-position:-312px 0;}
-.icon-zoom-in{background-position:-336px 0;}
-.icon-zoom-out{background-position:-360px 0;}
-.icon-off{background-position:-384px 0;}
-.icon-signal{background-position:-408px 0;}
-.icon-cog{background-position:-432px 0;}
-.icon-trash{background-position:-456px 0;}
-.icon-home{background-position:0 -24px;}
-.icon-file{background-position:-24px -24px;}
-.icon-time{background-position:-48px -24px;}
-.icon-road{background-position:-72px -24px;}
-.icon-download-alt{background-position:-96px -24px;}
-.icon-download{background-position:-120px -24px;}
-.icon-upload{background-position:-144px -24px;}
-.icon-inbox{background-position:-168px -24px;}
-.icon-play-circle{background-position:-192px -24px;}
-.icon-repeat{background-position:-216px -24px;}
-.icon-refresh{background-position:-240px -24px;}
-.icon-list-alt{background-position:-264px -24px;}
-.icon-lock{background-position:-287px -24px;}
-.icon-flag{background-position:-312px -24px;}
-.icon-headphones{background-position:-336px -24px;}
-.icon-volume-off{background-position:-360px -24px;}
-.icon-volume-down{background-position:-384px -24px;}
-.icon-volume-up{background-position:-408px -24px;}
-.icon-qrcode{background-position:-432px -24px;}
-.icon-barcode{background-position:-456px -24px;}
-.icon-tag{background-position:0 -48px;}
-.icon-tags{background-position:-25px -48px;}
-.icon-book{background-position:-48px -48px;}
-.icon-bookmark{background-position:-72px -48px;}
-.icon-print{background-position:-96px -48px;}
-.icon-camera{background-position:-120px -48px;}
-.icon-font{background-position:-144px -48px;}
-.icon-bold{background-position:-167px -48px;}
-.icon-italic{background-position:-192px -48px;}
-.icon-text-height{background-position:-216px -48px;}
-.icon-text-width{background-position:-240px -48px;}
-.icon-align-left{background-position:-264px -48px;}
-.icon-align-center{background-position:-288px -48px;}
-.icon-align-right{background-position:-312px -48px;}
-.icon-align-justify{background-position:-336px -48px;}
-.icon-list{background-position:-360px -48px;}
-.icon-indent-left{background-position:-384px -48px;}
-.icon-indent-right{background-position:-408px -48px;}
-.icon-facetime-video{background-position:-432px -48px;}
-.icon-picture{background-position:-456px -48px;}
-.icon-pencil{background-position:0 -72px;}
-.icon-map-marker{background-position:-24px -72px;}
-.icon-adjust{background-position:-48px -72px;}
-.icon-tint{background-position:-72px -72px;}
-.icon-edit{background-position:-96px -72px;}
-.icon-share{background-position:-120px -72px;}
-.icon-check{background-position:-144px -72px;}
-.icon-move{background-position:-168px -72px;}
-.icon-step-backward{background-position:-192px -72px;}
-.icon-fast-backward{background-position:-216px -72px;}
-.icon-backward{background-position:-240px -72px;}
-.icon-play{background-position:-264px -72px;}
-.icon-pause{background-position:-288px -72px;}
-.icon-stop{background-position:-312px -72px;}
-.icon-forward{background-position:-336px -72px;}
-.icon-fast-forward{background-position:-360px -72px;}
-.icon-step-forward{background-position:-384px -72px;}
-.icon-eject{background-position:-408px -72px;}
-.icon-chevron-left{background-position:-432px -72px;}
-.icon-chevron-right{background-position:-456px -72px;}
-.icon-plus-sign{background-position:0 -96px;}
-.icon-minus-sign{background-position:-24px -96px;}
-.icon-remove-sign{background-position:-48px -96px;}
-.icon-ok-sign{background-position:-72px -96px;}
-.icon-question-sign{background-position:-96px -96px;}
-.icon-info-sign{background-position:-120px -96px;}
-.icon-screenshot{background-position:-144px -96px;}
-.icon-remove-circle{background-position:-168px -96px;}
-.icon-ok-circle{background-position:-192px -96px;}
-.icon-ban-circle{background-position:-216px -96px;}
-.icon-arrow-left{background-position:-240px -96px;}
-.icon-arrow-right{background-position:-264px -96px;}
-.icon-arrow-up{background-position:-289px -96px;}
-.icon-arrow-down{background-position:-312px -96px;}
-.icon-share-alt{background-position:-336px -96px;}
-.icon-resize-full{background-position:-360px -96px;}
-.icon-resize-small{background-position:-384px -96px;}
-.icon-plus{background-position:-408px -96px;}
-.icon-minus{background-position:-433px -96px;}
-.icon-asterisk{background-position:-456px -96px;}
-.icon-exclamation-sign{background-position:0 -120px;}
-.icon-gift{background-position:-24px -120px;}
-.icon-leaf{background-position:-48px -120px;}
-.icon-fire{background-position:-72px -120px;}
-.icon-eye-open{background-position:-96px -120px;}
-.icon-eye-close{background-position:-120px -120px;}
-.icon-warning-sign{background-position:-144px -120px;}
-.icon-plane{background-position:-168px -120px;}
-.icon-calendar{background-position:-192px -120px;}
-.icon-random{background-position:-216px -120px;width:16px;}
-.icon-comment{background-position:-240px -120px;}
-.icon-magnet{background-position:-264px -120px;}
-.icon-chevron-up{background-position:-288px -120px;}
-.icon-chevron-down{background-position:-313px -119px;}
-.icon-retweet{background-position:-336px -120px;}
-.icon-shopping-cart{background-position:-360px -120px;}
-.icon-folder-close{background-position:-384px -120px;}
-.icon-folder-open{background-position:-408px -120px;width:16px;}
-.icon-resize-vertical{background-position:-432px -119px;}
-.icon-resize-horizontal{background-position:-456px -118px;}
-.icon-hdd{background-position:0 -144px;}
-.icon-bullhorn{background-position:-24px -144px;}
-.icon-bell{background-position:-48px -144px;}
-.icon-certificate{background-position:-72px -144px;}
-.icon-thumbs-up{background-position:-96px -144px;}
-.icon-thumbs-down{background-position:-120px -144px;}
-.icon-hand-right{background-position:-144px -144px;}
-.icon-hand-left{background-position:-168px -144px;}
-.icon-hand-up{background-position:-192px -144px;}
-.icon-hand-down{background-position:-216px -144px;}
-.icon-circle-arrow-right{background-position:-240px -144px;}
-.icon-circle-arrow-left{background-position:-264px -144px;}
-.icon-circle-arrow-up{background-position:-288px -144px;}
-.icon-circle-arrow-down{background-position:-312px -144px;}
-.icon-globe{background-position:-336px -144px;}
-.icon-wrench{background-position:-360px -144px;}
-.icon-tasks{background-position:-384px -144px;}
-.icon-filter{background-position:-408px -144px;}
-.icon-briefcase{background-position:-432px -144px;}
-.icon-fullscreen{background-position:-456px -144px;}
-.btn-group{position:relative;display:inline-block;*display:inline;*zoom:1;font-size:0;vertical-align:middle;white-space:nowrap;*margin-left:.3em;}.btn-group:first-child{*margin-left:0;}
-.btn-group+.btn-group{margin-left:5px;}
-.btn-toolbar{font-size:0;margin-top:10px;margin-bottom:10px;}.btn-toolbar>.btn+.btn,.btn-toolbar>.btn-group+.btn,.btn-toolbar>.btn+.btn-group{margin-left:5px;}
-.btn-group>.btn{position:relative;-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.btn-group>.btn+.btn{margin-left:-1px;}
-.btn-group>.btn,.btn-group>.dropdown-menu,.btn-group>.popover{font-size:14px;}
-.btn-group>.btn-mini{font-size:10.5px;}
-.btn-group>.btn-small{font-size:11.9px;}
-.btn-group>.btn-large{font-size:17.5px;}
-.btn-group>.btn:first-child{margin-left:0;-webkit-border-top-left-radius:4px;-moz-border-radius-topleft:4px;border-top-left-radius:4px;-webkit-border-bottom-left-radius:4px;-moz-border-radius-bottomleft:4px;border-bottom-left-radius:4px;}
-.btn-group>.btn:last-child,.btn-group>.dropdown-toggle{-webkit-border-top-right-radius:4px;-moz-border-radius-topright:4px;border-top-right-radius:4px;-webkit-border-bottom-right-radius:4px;-moz-border-radius-bottomright:4px;border-bottom-right-radius:4px;}
-.btn-group>.btn.large:first-child{margin-left:0;-webkit-border-top-left-radius:6px;-moz-border-radius-topleft:6px;border-top-left-radius:6px;-webkit-border-bottom-left-radius:6px;-moz-border-radius-bottomleft:6px;border-bottom-left-radius:6px;}
-.btn-group>.btn.large:last-child,.btn-group>.large.dropdown-toggle{-webkit-border-top-right-radius:6px;-moz-border-radius-topright:6px;border-top-right-radius:6px;-webkit-border-bottom-right-radius:6px;-moz-border-radius-bottomright:6px;border-bottom-right-radius:6px;}
-.btn-group>.btn:hover,.btn-group>.btn:focus,.btn-group>.btn:active,.btn-group>.btn.active{z-index:2;}
-.btn-group .dropdown-toggle:active,.btn-group.open .dropdown-toggle{outline:0;}
-.btn-group>.btn+.dropdown-toggle{padding-left:8px;padding-right:8px;-webkit-box-shadow:inset 1px 0 0 rgba(255,255,255,.125), inset 0 1px 0 rgba(255,255,255,.2), 0 1px 2px rgba(0,0,0,.05);-moz-box-shadow:inset 1px 0 0 rgba(255,255,255,.125), inset 0 1px 0 rgba(255,255,255,.2), 0 1px 2px rgba(0,0,0,.05);box-shadow:inset 1px 0 0 rgba(255,255,255,.125), inset 0 1px 0 rgba(255,255,255,.2), 0 1px 2px rgba(0,0,0,.05);*padding-top:5px;*padding-bottom:5px;}
-.btn-group>.btn-mini+.dropdown-toggle{padding-left:5px;padding-right:5px;*padding-top:2px;*padding-bottom:2px;}
-.btn-group>.btn-small+.dropdown-toggle{*padding-top:5px;*padding-bottom:4px;}
-.btn-group>.btn-large+.dropdown-toggle{padding-left:12px;padding-right:12px;*padding-top:7px;*padding-bottom:7px;}
-.btn-group.open .dropdown-toggle{background-image:none;-webkit-box-shadow:inset 0 2px 4px rgba(0,0,0,.15), 0 1px 2px rgba(0,0,0,.05);-moz-box-shadow:inset 0 2px 4px rgba(0,0,0,.15), 0 1px 2px rgba(0,0,0,.05);box-shadow:inset 0 2px 4px rgba(0,0,0,.15), 0 1px 2px rgba(0,0,0,.05);}
-.btn-group.open .btn.dropdown-toggle{background-color:#e6e6e6;}
-.btn-group.open .btn-primary.dropdown-toggle{background-color:#0044cc;}
-.btn-group.open .btn-warning.dropdown-toggle{background-color:#f89406;}
-.btn-group.open .btn-danger.dropdown-toggle{background-color:#bd362f;}
-.btn-group.open .btn-success.dropdown-toggle{background-color:#51a351;}
-.btn-group.open .btn-info.dropdown-toggle{background-color:#2f96b4;}
-.btn-group.open .btn-inverse.dropdown-toggle{background-color:#222222;}
-.btn .caret{margin-top:8px;margin-left:0;}
-.btn-mini .caret,.btn-small .caret,.btn-large .caret{margin-top:6px;}
-.btn-large .caret{border-left-width:5px;border-right-width:5px;border-top-width:5px;}
-.dropup .btn-large .caret{border-bottom-width:5px;}
-.btn-primary .caret,.btn-warning .caret,.btn-danger .caret,.btn-info .caret,.btn-success .caret,.btn-inverse .caret{border-top-color:#ffffff;border-bottom-color:#ffffff;}
-.btn-group-vertical{display:inline-block;*display:inline;*zoom:1;}
-.btn-group-vertical>.btn{display:block;float:none;max-width:100%;-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.btn-group-vertical>.btn+.btn{margin-left:0;margin-top:-1px;}
-.btn-group-vertical>.btn:first-child{-webkit-border-radius:4px 4px 0 0;-moz-border-radius:4px 4px 0 0;border-radius:4px 4px 0 0;}
-.btn-group-vertical>.btn:last-child{-webkit-border-radius:0 0 4px 4px;-moz-border-radius:0 0 4px 4px;border-radius:0 0 4px 4px;}
-.btn-group-vertical>.btn-large:first-child{-webkit-border-radius:6px 6px 0 0;-moz-border-radius:6px 6px 0 0;border-radius:6px 6px 0 0;}
-.btn-group-vertical>.btn-large:last-child{-webkit-border-radius:0 0 6px 6px;-moz-border-radius:0 0 6px 6px;border-radius:0 0 6px 6px;}
-.nav{margin-left:0;margin-bottom:20px;list-style:none;}
-.nav>li>a{display:block;}
-.nav>li>a:hover{text-decoration:none;background-color:#eeeeee;}
-.nav>li>a>img{max-width:none;}
-.nav>.pull-right{float:right;}
-.nav-header{display:block;padding:3px 15px;font-size:11px;font-weight:bold;line-height:20px;color:#999999;text-shadow:0 1px 0 rgba(255, 255, 255, 0.5);text-transform:uppercase;}
-.nav li+.nav-header{margin-top:9px;}
-.nav-list{padding-left:15px;padding-right:15px;margin-bottom:0;}
-.nav-list>li>a,.nav-list .nav-header{margin-left:-15px;margin-right:-15px;text-shadow:0 1px 0 rgba(255, 255, 255, 0.5);}
-.nav-list>li>a{padding:3px 15px;}
-.nav-list>.active>a,.nav-list>.active>a:hover{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.2);background-color:#0088cc;}
-.nav-list [class^="icon-"],.nav-list [class*=" icon-"]{margin-right:2px;}
-.nav-list .divider{*width:100%;height:1px;margin:9px 1px;*margin:-5px 0 5px;overflow:hidden;background-color:#e5e5e5;border-bottom:1px solid #ffffff;}
-.nav-tabs,.nav-pills{*zoom:1;}.nav-tabs:before,.nav-pills:before,.nav-tabs:after,.nav-pills:after{display:table;content:"";line-height:0;}
-.nav-tabs:after,.nav-pills:after{clear:both;}
-.nav-tabs>li,.nav-pills>li{float:left;}
-.nav-tabs>li>a,.nav-pills>li>a{padding-right:12px;padding-left:12px;margin-right:2px;line-height:14px;}
-.nav-tabs{border-bottom:1px solid #ddd;}
-.nav-tabs>li{margin-bottom:-1px;}
-.nav-tabs>li>a{padding-top:8px;padding-bottom:8px;line-height:20px;border:1px solid transparent;-webkit-border-radius:4px 4px 0 0;-moz-border-radius:4px 4px 0 0;border-radius:4px 4px 0 0;}.nav-tabs>li>a:hover{border-color:#eeeeee #eeeeee #dddddd;}
-.nav-tabs>.active>a,.nav-tabs>.active>a:hover{color:#555555;background-color:#ffffff;border:1px solid #ddd;border-bottom-color:transparent;cursor:default;}
-.nav-pills>li>a{padding-top:8px;padding-bottom:8px;margin-top:2px;margin-bottom:2px;-webkit-border-radius:5px;-moz-border-radius:5px;border-radius:5px;}
-.nav-pills>.active>a,.nav-pills>.active>a:hover{color:#ffffff;background-color:#0088cc;}
-.nav-stacked>li{float:none;}
-.nav-stacked>li>a{margin-right:0;}
-.nav-tabs.nav-stacked{border-bottom:0;}
-.nav-tabs.nav-stacked>li>a{border:1px solid #ddd;-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.nav-tabs.nav-stacked>li:first-child>a{-webkit-border-top-right-radius:4px;-moz-border-radius-topright:4px;border-top-right-radius:4px;-webkit-border-top-left-radius:4px;-moz-border-radius-topleft:4px;border-top-left-radius:4px;}
-.nav-tabs.nav-stacked>li:last-child>a{-webkit-border-bottom-right-radius:4px;-moz-border-radius-bottomright:4px;border-bottom-right-radius:4px;-webkit-border-bottom-left-radius:4px;-moz-border-radius-bottomleft:4px;border-bottom-left-radius:4px;}
-.nav-tabs.nav-stacked>li>a:hover{border-color:#ddd;z-index:2;}
-.nav-pills.nav-stacked>li>a{margin-bottom:3px;}
-.nav-pills.nav-stacked>li:last-child>a{margin-bottom:1px;}
-.nav-tabs .dropdown-menu{-webkit-border-radius:0 0 6px 6px;-moz-border-radius:0 0 6px 6px;border-radius:0 0 6px 6px;}
-.nav-pills .dropdown-menu{-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;}
-.nav .dropdown-toggle .caret{border-top-color:#0088cc;border-bottom-color:#0088cc;margin-top:6px;}
-.nav .dropdown-toggle:hover .caret{border-top-color:#005580;border-bottom-color:#005580;}
-.nav-tabs .dropdown-toggle .caret{margin-top:8px;}
-.nav .active .dropdown-toggle .caret{border-top-color:#fff;border-bottom-color:#fff;}
-.nav-tabs .active .dropdown-toggle .caret{border-top-color:#555555;border-bottom-color:#555555;}
-.nav>.dropdown.active>a:hover{cursor:pointer;}
-.nav-tabs .open .dropdown-toggle,.nav-pills .open .dropdown-toggle,.nav>li.dropdown.open.active>a:hover{color:#ffffff;background-color:#999999;border-color:#999999;}
-.nav li.dropdown.open .caret,.nav li.dropdown.open.active .caret,.nav li.dropdown.open a:hover .caret{border-top-color:#ffffff;border-bottom-color:#ffffff;opacity:1;filter:alpha(opacity=100);}
-.tabs-stacked .open>a:hover{border-color:#999999;}
-.tabbable{*zoom:1;}.tabbable:before,.tabbable:after{display:table;content:"";line-height:0;}
-.tabbable:after{clear:both;}
-.tab-content{overflow:auto;}
-.tabs-below>.nav-tabs,.tabs-right>.nav-tabs,.tabs-left>.nav-tabs{border-bottom:0;}
-.tab-content>.tab-pane,.pill-content>.pill-pane{display:none;}
-.tab-content>.active,.pill-content>.active{display:block;}
-.tabs-below>.nav-tabs{border-top:1px solid #ddd;}
-.tabs-below>.nav-tabs>li{margin-top:-1px;margin-bottom:0;}
-.tabs-below>.nav-tabs>li>a{-webkit-border-radius:0 0 4px 4px;-moz-border-radius:0 0 4px 4px;border-radius:0 0 4px 4px;}.tabs-below>.nav-tabs>li>a:hover{border-bottom-color:transparent;border-top-color:#ddd;}
-.tabs-below>.nav-tabs>.active>a,.tabs-below>.nav-tabs>.active>a:hover{border-color:transparent #ddd #ddd #ddd;}
-.tabs-left>.nav-tabs>li,.tabs-right>.nav-tabs>li{float:none;}
-.tabs-left>.nav-tabs>li>a,.tabs-right>.nav-tabs>li>a{min-width:74px;margin-right:0;margin-bottom:3px;}
-.tabs-left>.nav-tabs{float:left;margin-right:19px;border-right:1px solid #ddd;}
-.tabs-left>.nav-tabs>li>a{margin-right:-1px;-webkit-border-radius:4px 0 0 4px;-moz-border-radius:4px 0 0 4px;border-radius:4px 0 0 4px;}
-.tabs-left>.nav-tabs>li>a:hover{border-color:#eeeeee #dddddd #eeeeee #eeeeee;}
-.tabs-left>.nav-tabs .active>a,.tabs-left>.nav-tabs .active>a:hover{border-color:#ddd transparent #ddd #ddd;*border-right-color:#ffffff;}
-.tabs-right>.nav-tabs{float:right;margin-left:19px;border-left:1px solid #ddd;}
-.tabs-right>.nav-tabs>li>a{margin-left:-1px;-webkit-border-radius:0 4px 4px 0;-moz-border-radius:0 4px 4px 0;border-radius:0 4px 4px 0;}
-.tabs-right>.nav-tabs>li>a:hover{border-color:#eeeeee #eeeeee #eeeeee #dddddd;}
-.tabs-right>.nav-tabs .active>a,.tabs-right>.nav-tabs .active>a:hover{border-color:#ddd #ddd #ddd transparent;*border-left-color:#ffffff;}
-.nav>.disabled>a{color:#999999;}
-.nav>.disabled>a:hover{text-decoration:none;background-color:transparent;cursor:default;}
-.navbar{overflow:visible;margin-bottom:20px;*position:relative;*z-index:2;}
-.navbar-inner{min-height:40px;padding-left:20px;padding-right:20px;background-color:#fafafa;background-image:-moz-linear-gradient(top, #ffffff, #f2f2f2);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#ffffff), to(#f2f2f2));background-image:-webkit-linear-gradient(top, #ffffff, #f2f2f2);background-image:-o-linear-gradient(top, #ffffff, #f2f2f2);background-image:linear-gradient(to bottom, #ffffff, #f2f2f2);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gra [...]
-.navbar-inner:after{clear:both;}
-.navbar .container{width:auto;}
-.nav-collapse.collapse{height:auto;overflow:visible;}
-.navbar .brand{float:left;display:block;padding:10px 20px 10px;margin-left:-20px;font-size:20px;font-weight:200;color:#777777;text-shadow:0 1px 0 #ffffff;}.navbar .brand:hover{text-decoration:none;}
-.navbar-text{margin-bottom:0;line-height:40px;color:#777777;}
-.navbar-link{color:#777777;}.navbar-link:hover{color:#333333;}
-.navbar .divider-vertical{height:40px;margin:0 9px;border-left:1px solid #f2f2f2;border-right:1px solid #ffffff;}
-.navbar .btn,.navbar .btn-group{margin-top:5px;}
-.navbar .btn-group .btn,.navbar .input-prepend .btn,.navbar .input-append .btn{margin-top:0;}
-.navbar-form{margin-bottom:0;*zoom:1;}.navbar-form:before,.navbar-form:after{display:table;content:"";line-height:0;}
-.navbar-form:after{clear:both;}
-.navbar-form input,.navbar-form select,.navbar-form .radio,.navbar-form .checkbox{margin-top:5px;}
-.navbar-form input,.navbar-form select,.navbar-form .btn{display:inline-block;margin-bottom:0;}
-.navbar-form input[type="image"],.navbar-form input[type="checkbox"],.navbar-form input[type="radio"]{margin-top:3px;}
-.navbar-form .input-append,.navbar-form .input-prepend{margin-top:5px;white-space:nowrap;}.navbar-form .input-append input,.navbar-form .input-prepend input{margin-top:0;}
-.navbar-search{position:relative;float:left;margin-top:5px;margin-bottom:0;}.navbar-search .search-query{margin-bottom:0;padding:4px 14px;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:13px;font-weight:normal;line-height:1;-webkit-border-radius:15px;-moz-border-radius:15px;border-radius:15px;}
-.navbar-static-top{position:static;margin-bottom:0;}.navbar-static-top .navbar-inner{-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.navbar-fixed-top,.navbar-fixed-bottom{position:fixed;right:0;left:0;z-index:1030;margin-bottom:0;}
-.navbar-fixed-top .navbar-inner,.navbar-static-top .navbar-inner{border-width:0 0 1px;}
-.navbar-fixed-bottom .navbar-inner{border-width:1px 0 0;}
-.navbar-fixed-top .navbar-inner,.navbar-fixed-bottom .navbar-inner{padding-left:0;padding-right:0;-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;}
-.navbar-static-top .container,.navbar-fixed-top .container,.navbar-fixed-bottom .container{width:940px;}
-.navbar-fixed-top{top:0;}
-.navbar-fixed-top .navbar-inner,.navbar-static-top .navbar-inner{-webkit-box-shadow:0 1px 10px rgba(0,0,0,.1);-moz-box-shadow:0 1px 10px rgba(0,0,0,.1);box-shadow:0 1px 10px rgba(0,0,0,.1);}
-.navbar-fixed-bottom{bottom:0;}.navbar-fixed-bottom .navbar-inner{-webkit-box-shadow:0 -1px 10px rgba(0,0,0,.1);-moz-box-shadow:0 -1px 10px rgba(0,0,0,.1);box-shadow:0 -1px 10px rgba(0,0,0,.1);}
-.navbar .nav{position:relative;left:0;display:block;float:left;margin:0 10px 0 0;}
-.navbar .nav.pull-right{float:right;margin-right:0;}
-.navbar .nav>li{float:left;}
-.navbar .nav>li>a{float:none;padding:10px 15px 10px;color:#777777;text-decoration:none;text-shadow:0 1px 0 #ffffff;}
-.navbar .nav .dropdown-toggle .caret{margin-top:8px;}
-.navbar .nav>li>a:focus,.navbar .nav>li>a:hover{background-color:transparent;color:#333333;text-decoration:none;}
-.navbar .nav>.active>a,.navbar .nav>.active>a:hover,.navbar .nav>.active>a:focus{color:#555555;text-decoration:none;background-color:#e5e5e5;-webkit-box-shadow:inset 0 3px 8px rgba(0, 0, 0, 0.125);-moz-box-shadow:inset 0 3px 8px rgba(0, 0, 0, 0.125);box-shadow:inset 0 3px 8px rgba(0, 0, 0, 0.125);}
-.navbar .btn-navbar{display:none;float:right;padding:7px 10px;margin-left:5px;margin-right:5px;color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#ededed;background-image:-moz-linear-gradient(top, #f2f2f2, #e5e5e5);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#f2f2f2), to(#e5e5e5));background-image:-webkit-linear-gradient(top, #f2f2f2, #e5e5e5);background-image:-o-linear-gradient(top, #f2f2f2, #e5e5e5);background-image:linear-gradient(to bottom, #f2f2f [...]
-.navbar .btn-navbar:active,.navbar .btn-navbar.active{background-color:#cccccc \9;}
-.navbar .btn-navbar .icon-bar{display:block;width:18px;height:2px;background-color:#f5f5f5;-webkit-border-radius:1px;-moz-border-radius:1px;border-radius:1px;-webkit-box-shadow:0 1px 0 rgba(0, 0, 0, 0.25);-moz-box-shadow:0 1px 0 rgba(0, 0, 0, 0.25);box-shadow:0 1px 0 rgba(0, 0, 0, 0.25);}
-.btn-navbar .icon-bar+.icon-bar{margin-top:3px;}
-.navbar .nav>li>.dropdown-menu:before{content:'';display:inline-block;border-left:7px solid transparent;border-right:7px solid transparent;border-bottom:7px solid #ccc;border-bottom-color:rgba(0, 0, 0, 0.2);position:absolute;top:-7px;left:9px;}
-.navbar .nav>li>.dropdown-menu:after{content:'';display:inline-block;border-left:6px solid transparent;border-right:6px solid transparent;border-bottom:6px solid #ffffff;position:absolute;top:-6px;left:10px;}
-.navbar-fixed-bottom .nav>li>.dropdown-menu:before{border-top:7px solid #ccc;border-top-color:rgba(0, 0, 0, 0.2);border-bottom:0;bottom:-7px;top:auto;}
-.navbar-fixed-bottom .nav>li>.dropdown-menu:after{border-top:6px solid #ffffff;border-bottom:0;bottom:-6px;top:auto;}
-.navbar .nav li.dropdown>a:hover .caret{border-top-color:#555555;border-bottom-color:#555555;}
-.navbar .nav li.dropdown.open>.dropdown-toggle,.navbar .nav li.dropdown.active>.dropdown-toggle,.navbar .nav li.dropdown.open.active>.dropdown-toggle{background-color:#e5e5e5;color:#555555;}
-.navbar .nav li.dropdown>.dropdown-toggle .caret{border-top-color:#777777;border-bottom-color:#777777;}
-.navbar .nav li.dropdown.open>.dropdown-toggle .caret,.navbar .nav li.dropdown.active>.dropdown-toggle .caret,.navbar .nav li.dropdown.open.active>.dropdown-toggle .caret{border-top-color:#555555;border-bottom-color:#555555;}
-.navbar .pull-right>li>.dropdown-menu,.navbar .nav>li>.dropdown-menu.pull-right{left:auto;right:0;}.navbar .pull-right>li>.dropdown-menu:before,.navbar .nav>li>.dropdown-menu.pull-right:before{left:auto;right:12px;}
-.navbar .pull-right>li>.dropdown-menu:after,.navbar .nav>li>.dropdown-menu.pull-right:after{left:auto;right:13px;}
-.navbar .pull-right>li>.dropdown-menu .dropdown-menu,.navbar .nav>li>.dropdown-menu.pull-right .dropdown-menu{left:auto;right:100%;margin-left:0;margin-right:-1px;-webkit-border-radius:6px 0 6px 6px;-moz-border-radius:6px 0 6px 6px;border-radius:6px 0 6px 6px;}
-.navbar-inverse .navbar-inner{background-color:#1b1b1b;background-image:-moz-linear-gradient(top, #222222, #111111);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#222222), to(#111111));background-image:-webkit-linear-gradient(top, #222222, #111111);background-image:-o-linear-gradient(top, #222222, #111111);background-image:linear-gradient(to bottom, #222222, #111111);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#ff222222', endC [...]
-.navbar-inverse .brand,.navbar-inverse .nav>li>a{color:#999999;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);}.navbar-inverse .brand:hover,.navbar-inverse .nav>li>a:hover{color:#ffffff;}
-.navbar-inverse .brand{color:#999999;}
-.navbar-inverse .navbar-text{color:#999999;}
-.navbar-inverse .nav>li>a:focus,.navbar-inverse .nav>li>a:hover{background-color:transparent;color:#ffffff;}
-.navbar-inverse .nav .active>a,.navbar-inverse .nav .active>a:hover,.navbar-inverse .nav .active>a:focus{color:#ffffff;background-color:#111111;}
-.navbar-inverse .navbar-link{color:#999999;}.navbar-inverse .navbar-link:hover{color:#ffffff;}
-.navbar-inverse .divider-vertical{border-left-color:#111111;border-right-color:#222222;}
-.navbar-inverse .nav li.dropdown.open>.dropdown-toggle,.navbar-inverse .nav li.dropdown.active>.dropdown-toggle,.navbar-inverse .nav li.dropdown.open.active>.dropdown-toggle{background-color:#111111;color:#ffffff;}
-.navbar-inverse .nav li.dropdown>a:hover .caret{border-top-color:#ffffff;border-bottom-color:#ffffff;}
-.navbar-inverse .nav li.dropdown>.dropdown-toggle .caret{border-top-color:#999999;border-bottom-color:#999999;}
-.navbar-inverse .nav li.dropdown.open>.dropdown-toggle .caret,.navbar-inverse .nav li.dropdown.active>.dropdown-toggle .caret,.navbar-inverse .nav li.dropdown.open.active>.dropdown-toggle .caret{border-top-color:#ffffff;border-bottom-color:#ffffff;}
-.navbar-inverse .navbar-search .search-query{color:#ffffff;background-color:#515151;border-color:#111111;-webkit-box-shadow:inset 0 1px 2px rgba(0,0,0,.1), 0 1px 0 rgba(255,255,255,.15);-moz-box-shadow:inset 0 1px 2px rgba(0,0,0,.1), 0 1px 0 rgba(255,255,255,.15);box-shadow:inset 0 1px 2px rgba(0,0,0,.1), 0 1px 0 rgba(255,255,255,.15);-webkit-transition:none;-moz-transition:none;-o-transition:none;transition:none;}.navbar-inverse .navbar-search .search-query:-moz-placeholder{color:#cccccc;}
-.navbar-inverse .navbar-search .search-query:-ms-input-placeholder{color:#cccccc;}
-.navbar-inverse .navbar-search .search-query::-webkit-input-placeholder{color:#cccccc;}
-.navbar-inverse .navbar-search .search-query:focus,.navbar-inverse .navbar-search .search-query.focused{padding:5px 15px;color:#333333;text-shadow:0 1px 0 #ffffff;background-color:#ffffff;border:0;-webkit-box-shadow:0 0 3px rgba(0, 0, 0, 0.15);-moz-box-shadow:0 0 3px rgba(0, 0, 0, 0.15);box-shadow:0 0 3px rgba(0, 0, 0, 0.15);outline:0;}
-.navbar-inverse .btn-navbar{color:#ffffff;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#0e0e0e;background-image:-moz-linear-gradient(top, #151515, #040404);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#151515), to(#040404));background-image:-webkit-linear-gradient(top, #151515, #040404);background-image:-o-linear-gradient(top, #151515, #040404);background-image:linear-gradient(to bottom, #151515, #040404);background-repeat:repeat-x;filter:progid:DXImageTransfo [...]
-.navbar-inverse .btn-navbar:active,.navbar-inverse .btn-navbar.active{background-color:#000000 \9;}
-.breadcrumb{padding:8px 15px;margin:0 0 20px;list-style:none;background-color:#f5f5f5;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}.breadcrumb>li{display:inline-block;*display:inline;*zoom:1;text-shadow:0 1px 0 #ffffff;}.breadcrumb>li>.divider{padding:0 5px;color:#ccc;}
-.breadcrumb>.active{color:#999999;}
-.pagination{margin:20px 0;}
-.pagination ul{display:inline-block;*display:inline;*zoom:1;margin-left:0;margin-bottom:0;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;-webkit-box-shadow:0 1px 2px rgba(0, 0, 0, 0.05);-moz-box-shadow:0 1px 2px rgba(0, 0, 0, 0.05);box-shadow:0 1px 2px rgba(0, 0, 0, 0.05);}
-.pagination ul>li{display:inline;}
-.pagination ul>li>a,.pagination ul>li>span{float:left;padding:4px 12px;line-height:20px;text-decoration:none;background-color:#ffffff;border:1px solid #dddddd;border-left-width:0;}
-.pagination ul>li>a:hover,.pagination ul>.active>a,.pagination ul>.active>span{background-color:#f5f5f5;}
-.pagination ul>.active>a,.pagination ul>.active>span{color:#999999;cursor:default;}
-.pagination ul>.disabled>span,.pagination ul>.disabled>a,.pagination ul>.disabled>a:hover{color:#999999;background-color:transparent;cursor:default;}
-.pagination ul>li:first-child>a,.pagination ul>li:first-child>span{border-left-width:1px;-webkit-border-top-left-radius:4px;-moz-border-radius-topleft:4px;border-top-left-radius:4px;-webkit-border-bottom-left-radius:4px;-moz-border-radius-bottomleft:4px;border-bottom-left-radius:4px;}
-.pagination ul>li:last-child>a,.pagination ul>li:last-child>span{-webkit-border-top-right-radius:4px;-moz-border-radius-topright:4px;border-top-right-radius:4px;-webkit-border-bottom-right-radius:4px;-moz-border-radius-bottomright:4px;border-bottom-right-radius:4px;}
-.pagination-centered{text-align:center;}
-.pagination-right{text-align:right;}
-.pagination-large ul>li>a,.pagination-large ul>li>span{padding:11px 19px;font-size:17.5px;}
-.pagination-large ul>li:first-child>a,.pagination-large ul>li:first-child>span{-webkit-border-top-left-radius:6px;-moz-border-radius-topleft:6px;border-top-left-radius:6px;-webkit-border-bottom-left-radius:6px;-moz-border-radius-bottomleft:6px;border-bottom-left-radius:6px;}
-.pagination-large ul>li:last-child>a,.pagination-large ul>li:last-child>span{-webkit-border-top-right-radius:6px;-moz-border-radius-topright:6px;border-top-right-radius:6px;-webkit-border-bottom-right-radius:6px;-moz-border-radius-bottomright:6px;border-bottom-right-radius:6px;}
-.pagination-mini ul>li:first-child>a,.pagination-small ul>li:first-child>a,.pagination-mini ul>li:first-child>span,.pagination-small ul>li:first-child>span{-webkit-border-top-left-radius:3px;-moz-border-radius-topleft:3px;border-top-left-radius:3px;-webkit-border-bottom-left-radius:3px;-moz-border-radius-bottomleft:3px;border-bottom-left-radius:3px;}
-.pagination-mini ul>li:last-child>a,.pagination-small ul>li:last-child>a,.pagination-mini ul>li:last-child>span,.pagination-small ul>li:last-child>span{-webkit-border-top-right-radius:3px;-moz-border-radius-topright:3px;border-top-right-radius:3px;-webkit-border-bottom-right-radius:3px;-moz-border-radius-bottomright:3px;border-bottom-right-radius:3px;}
-.pagination-small ul>li>a,.pagination-small ul>li>span{padding:2px 10px;font-size:11.9px;}
-.pagination-mini ul>li>a,.pagination-mini ul>li>span{padding:0 6px;font-size:10.5px;}
-.pager{margin:20px 0;list-style:none;text-align:center;*zoom:1;}.pager:before,.pager:after{display:table;content:"";line-height:0;}
-.pager:after{clear:both;}
-.pager li{display:inline;}
-.pager li>a,.pager li>span{display:inline-block;padding:5px 14px;background-color:#fff;border:1px solid #ddd;-webkit-border-radius:15px;-moz-border-radius:15px;border-radius:15px;}
-.pager li>a:hover{text-decoration:none;background-color:#f5f5f5;}
-.pager .next>a,.pager .next>span{float:right;}
-.pager .previous>a,.pager .previous>span{float:left;}
-.pager .disabled>a,.pager .disabled>a:hover,.pager .disabled>span{color:#999999;background-color:#fff;cursor:default;}
-.thumbnails{margin-left:-20px;list-style:none;*zoom:1;}.thumbnails:before,.thumbnails:after{display:table;content:"";line-height:0;}
-.thumbnails:after{clear:both;}
-.row-fluid .thumbnails{margin-left:0;}
-.thumbnails>li{float:left;margin-bottom:20px;margin-left:20px;}
-.thumbnail{display:block;padding:4px;line-height:20px;border:1px solid #ddd;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;-webkit-box-shadow:0 1px 3px rgba(0, 0, 0, 0.055);-moz-box-shadow:0 1px 3px rgba(0, 0, 0, 0.055);box-shadow:0 1px 3px rgba(0, 0, 0, 0.055);-webkit-transition:all 0.2s ease-in-out;-moz-transition:all 0.2s ease-in-out;-o-transition:all 0.2s ease-in-out;transition:all 0.2s ease-in-out;}
-a.thumbnail:hover{border-color:#0088cc;-webkit-box-shadow:0 1px 4px rgba(0, 105, 214, 0.25);-moz-box-shadow:0 1px 4px rgba(0, 105, 214, 0.25);box-shadow:0 1px 4px rgba(0, 105, 214, 0.25);}
-.thumbnail>img{display:block;max-width:100%;margin-left:auto;margin-right:auto;}
-.thumbnail .caption{padding:9px;color:#555555;}
-.alert{padding:8px 35px 8px 14px;margin-bottom:20px;text-shadow:0 1px 0 rgba(255, 255, 255, 0.5);background-color:#fcf8e3;border:1px solid #fbeed5;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
-.alert,.alert h4{color:#c09853;}
-.alert h4{margin:0;}
-.alert .close{position:relative;top:-2px;right:-21px;line-height:20px;}
-.alert-success{background-color:#dff0d8;border-color:#d6e9c6;color:#468847;}
-.alert-success h4{color:#468847;}
-.alert-danger,.alert-error{background-color:#f2dede;border-color:#eed3d7;color:#b94a48;}
-.alert-danger h4,.alert-error h4{color:#b94a48;}
-.alert-info{background-color:#d9edf7;border-color:#bce8f1;color:#3a87ad;}
-.alert-info h4{color:#3a87ad;}
-.alert-block{padding-top:14px;padding-bottom:14px;}
-.alert-block>p,.alert-block>ul{margin-bottom:0;}
-.alert-block p+p{margin-top:5px;}
-@-webkit-keyframes progress-bar-stripes{from{background-position:40px 0;} to{background-position:0 0;}}@-moz-keyframes progress-bar-stripes{from{background-position:40px 0;} to{background-position:0 0;}}@-ms-keyframes progress-bar-stripes{from{background-position:40px 0;} to{background-position:0 0;}}@-o-keyframes progress-bar-stripes{from{background-position:0 0;} to{background-position:40px 0;}}@keyframes progress-bar-stripes{from{background-position:40px 0;} to{background-position:0 0 [...]
-.progress .bar{width:0%;height:100%;color:#ffffff;float:left;font-size:12px;text-align:center;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#0e90d2;background-image:-moz-linear-gradient(top, #149bdf, #0480be);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#149bdf), to(#0480be));background-image:-webkit-linear-gradient(top, #149bdf, #0480be);background-image:-o-linear-gradient(top, #149bdf, #0480be);background-image:linear-gradient(to bottom, #149bdf, #0480be);bac [...]
-.progress .bar+.bar{-webkit-box-shadow:inset 1px 0 0 rgba(0,0,0,.15), inset 0 -1px 0 rgba(0,0,0,.15);-moz-box-shadow:inset 1px 0 0 rgba(0,0,0,.15), inset 0 -1px 0 rgba(0,0,0,.15);box-shadow:inset 1px 0 0 rgba(0,0,0,.15), inset 0 -1px 0 rgba(0,0,0,.15);}
-.progress-striped .bar{background-color:#149bdf;background-image:-webkit-gradient(linear, 0 100%, 100% 0, color-stop(0.25, rgba(255, 255, 255, 0.15)), color-stop(0.25, transparent), color-stop(0.5, transparent), color-stop(0.5, rgba(255, 255, 255, 0.15)), color-stop(0.75, rgba(255, 255, 255, 0.15)), color-stop(0.75, transparent), to(transparent));background-image:-webkit-linear-gradient(45deg, rgba(255, 255, 255, 0.15) 25%, transparent 25%, transparent 50%, rgba(255, 255, 255, 0.15) 50%, [...]
-.progress.active .bar{-webkit-animation:progress-bar-stripes 2s linear infinite;-moz-animation:progress-bar-stripes 2s linear infinite;-ms-animation:progress-bar-stripes 2s linear infinite;-o-animation:progress-bar-stripes 2s linear infinite;animation:progress-bar-stripes 2s linear infinite;}
-.progress-danger .bar,.progress .bar-danger{background-color:#dd514c;background-image:-moz-linear-gradient(top, #ee5f5b, #c43c35);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#ee5f5b), to(#c43c35));background-image:-webkit-linear-gradient(top, #ee5f5b, #c43c35);background-image:-o-linear-gradient(top, #ee5f5b, #c43c35);background-image:linear-gradient(to bottom, #ee5f5b, #c43c35);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#f [...]
-.progress-danger.progress-striped .bar,.progress-striped .bar-danger{background-color:#ee5f5b;background-image:-webkit-gradient(linear, 0 100%, 100% 0, color-stop(0.25, rgba(255, 255, 255, 0.15)), color-stop(0.25, transparent), color-stop(0.5, transparent), color-stop(0.5, rgba(255, 255, 255, 0.15)), color-stop(0.75, rgba(255, 255, 255, 0.15)), color-stop(0.75, transparent), to(transparent));background-image:-webkit-linear-gradient(45deg, rgba(255, 255, 255, 0.15) 25%, transparent 25%, t [...]
-.progress-success .bar,.progress .bar-success{background-color:#5eb95e;background-image:-moz-linear-gradient(top, #62c462, #57a957);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#62c462), to(#57a957));background-image:-webkit-linear-gradient(top, #62c462, #57a957);background-image:-o-linear-gradient(top, #62c462, #57a957);background-image:linear-gradient(to bottom, #62c462, #57a957);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=' [...]
-.progress-success.progress-striped .bar,.progress-striped .bar-success{background-color:#62c462;background-image:-webkit-gradient(linear, 0 100%, 100% 0, color-stop(0.25, rgba(255, 255, 255, 0.15)), color-stop(0.25, transparent), color-stop(0.5, transparent), color-stop(0.5, rgba(255, 255, 255, 0.15)), color-stop(0.75, rgba(255, 255, 255, 0.15)), color-stop(0.75, transparent), to(transparent));background-image:-webkit-linear-gradient(45deg, rgba(255, 255, 255, 0.15) 25%, transparent 25%, [...]
-.progress-info .bar,.progress .bar-info{background-color:#4bb1cf;background-image:-moz-linear-gradient(top, #5bc0de, #339bb9);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#5bc0de), to(#339bb9));background-image:-webkit-linear-gradient(top, #5bc0de, #339bb9);background-image:-o-linear-gradient(top, #5bc0de, #339bb9);background-image:linear-gradient(to bottom, #5bc0de, #339bb9);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#ff5bc [...]
-.progress-info.progress-striped .bar,.progress-striped .bar-info{background-color:#5bc0de;background-image:-webkit-gradient(linear, 0 100%, 100% 0, color-stop(0.25, rgba(255, 255, 255, 0.15)), color-stop(0.25, transparent), color-stop(0.5, transparent), color-stop(0.5, rgba(255, 255, 255, 0.15)), color-stop(0.75, rgba(255, 255, 255, 0.15)), color-stop(0.75, transparent), to(transparent));background-image:-webkit-linear-gradient(45deg, rgba(255, 255, 255, 0.15) 25%, transparent 25%, trans [...]
-.progress-warning .bar,.progress .bar-warning{background-color:#faa732;background-image:-moz-linear-gradient(top, #fbb450, #f89406);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#fbb450), to(#f89406));background-image:-webkit-linear-gradient(top, #fbb450, #f89406);background-image:-o-linear-gradient(top, #fbb450, #f89406);background-image:linear-gradient(to bottom, #fbb450, #f89406);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=' [...]
-.progress-warning.progress-striped .bar,.progress-striped .bar-warning{background-color:#fbb450;background-image:-webkit-gradient(linear, 0 100%, 100% 0, color-stop(0.25, rgba(255, 255, 255, 0.15)), color-stop(0.25, transparent), color-stop(0.5, transparent), color-stop(0.5, rgba(255, 255, 255, 0.15)), color-stop(0.75, rgba(255, 255, 255, 0.15)), color-stop(0.75, transparent), to(transparent));background-image:-webkit-linear-gradient(45deg, rgba(255, 255, 255, 0.15) 25%, transparent 25%, [...]
-.hero-unit{padding:60px;margin-bottom:30px;font-size:18px;font-weight:200;line-height:30px;color:inherit;background-color:#eeeeee;-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;}.hero-unit h1{margin-bottom:0;font-size:60px;line-height:1;color:inherit;letter-spacing:-1px;}
-.hero-unit li{line-height:30px;}
-.media,.media-body{overflow:hidden;*overflow:visible;zoom:1;}
-.media,.media .media{margin-top:15px;}
-.media:first-child{margin-top:0;}
-.media-object{display:block;}
-.media-heading{margin:0 0 5px;}
-.media .pull-left{margin-right:10px;}
-.media .pull-right{margin-left:10px;}
-.media-list{margin-left:0;list-style:none;}
-.well{min-height:20px;padding:19px;margin-bottom:20px;background-color:#f5f5f5;border:1px solid #e3e3e3;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.05);-moz-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.05);box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.05);}.well blockquote{border-color:#ddd;border-color:rgba(0, 0, 0, 0.15);}
-.well-large{padding:24px;-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;}
-.well-small{padding:9px;-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}
-.close{float:right;font-size:20px;font-weight:bold;line-height:20px;color:#000000;text-shadow:0 1px 0 #ffffff;opacity:0.2;filter:alpha(opacity=20);}.close:hover{color:#000000;text-decoration:none;cursor:pointer;opacity:0.4;filter:alpha(opacity=40);}
-button.close{padding:0;cursor:pointer;background:transparent;border:0;-webkit-appearance:none;}
-.pull-right{float:right;}
-.pull-left{float:left;}
-.hide{display:none;}
-.show{display:block;}
-.invisible{visibility:hidden;}
-.affix{position:fixed;}
-.fade{opacity:0;-webkit-transition:opacity 0.15s linear;-moz-transition:opacity 0.15s linear;-o-transition:opacity 0.15s linear;transition:opacity 0.15s linear;}.fade.in{opacity:1;}
-.collapse{position:relative;height:0;overflow:hidden;-webkit-transition:height 0.35s ease;-moz-transition:height 0.35s ease;-o-transition:height 0.35s ease;transition:height 0.35s ease;}.collapse.in{height:auto;}
-.hidden{display:none;visibility:hidden;}
-.visible-phone{display:none !important;}
-.visible-tablet{display:none !important;}
-.hidden-desktop{display:none !important;}
-.visible-desktop{display:inherit !important;}
-@media (min-width:768px) and (max-width:979px){.hidden-desktop{display:inherit !important;} .visible-desktop{display:none !important ;} .visible-tablet{display:inherit !important;} .hidden-tablet{display:none !important;}}@media (max-width:767px){.hidden-desktop{display:inherit !important;} .visible-desktop{display:none !important;} .visible-phone{display:inherit !important;} .hidden-phone{display:none !important;}}@media (max-width:767px){body{padding-left:20px;padding-right:20px;} .nav [...]
diff --git a/_site/assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings-white.png b/_site/assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings-white.png
deleted file mode 100644
index 3bf6484..0000000
Binary files a/_site/assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings-white.png and /dev/null differ
diff --git a/_site/assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings.png b/_site/assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings.png
deleted file mode 100644
index a996999..0000000
Binary files a/_site/assets/themes/custom-twitter/bootstrap/img/glyphicons-halflings.png and /dev/null differ
diff --git a/_site/assets/themes/custom-twitter/css/1.4.0/bootstrap.css b/_site/assets/themes/custom-twitter/css/1.4.0/bootstrap.css
deleted file mode 100644
index 7f83ac5..0000000
--- a/_site/assets/themes/custom-twitter/css/1.4.0/bootstrap.css
+++ /dev/null
@@ -1,356 +0,0 @@
-html,body{margin:0;padding:0;}
-h1,h2,h3,h4,h5,h6,p,blockquote,pre,a,abbr,acronym,address,cite,code,del,dfn,em,img,q,s,samp,small,strike,strong,sub,sup,tt,var,dd,dl,dt,li,ol,ul,fieldset,form,label,legend,button,table,caption,tbody,tfoot,thead,tr,th,td{margin:0;padding:0;border:0;font-weight:normal;font-style:normal;font-size:100%;line-height:1;font-family:inherit;}
-table{border-collapse:collapse;border-spacing:0;}
-ol,ul{list-style:none;}
-q:before,q:after,blockquote:before,blockquote:after{content:"";}
-html{overflow-y:scroll;font-size:100%;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%;}
-a:focus{outline:thin dotted;}
-a:hover,a:active{outline:0;}
-article,aside,details,figcaption,figure,footer,header,hgroup,nav,section{display:block;}
-audio,canvas,video{display:inline-block;*display:inline;*zoom:1;}
-audio:not([controls]){display:none;}
-sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline;}
-sup{top:-0.5em;}
-sub{bottom:-0.25em;}
-img{border:0;-ms-interpolation-mode:bicubic;}
-button,input,select,textarea{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle;}
-button,input{line-height:normal;*overflow:visible;}
-button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0;}
-button,input[type="button"],input[type="reset"],input[type="submit"]{cursor:pointer;-webkit-appearance:button;}
-input[type="search"]{-webkit-appearance:textfield;-webkit-box-sizing:content-box;-moz-box-sizing:content-box;box-sizing:content-box;}
-input[type="search"]::-webkit-search-decoration{-webkit-appearance:none;}
-textarea{overflow:auto;vertical-align:top;}
-body{background-color:#ffffff;margin:0;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:13px;font-weight:normal;line-height:18px;color:#404040;}
-.container{width:940px;margin-left:auto;margin-right:auto;zoom:1;}.container:before,.container:after{display:table;content:"";zoom:1;}
-.container:after{clear:both;}
-.container-fluid{position:relative;min-width:940px;padding-left:20px;padding-right:20px;zoom:1;}.container-fluid:before,.container-fluid:after{display:table;content:"";zoom:1;}
-.container-fluid:after{clear:both;}
-.container-fluid>.sidebar{position:absolute;top:0;left:20px;width:220px;}
-.container-fluid>.content{margin-left:240px;}
-a{color:#0069d6;text-decoration:none;line-height:inherit;font-weight:inherit;}a:hover{color:#00438a;text-decoration:underline;}
-.pull-right{float:right;}
-.pull-left{float:left;}
-.hide{display:none;}
-.show{display:block;}
-.row{zoom:1;margin-left:-20px;}.row:before,.row:after{display:table;content:"";zoom:1;}
-.row:after{clear:both;}
-.row>[class*="span"]{display:inline;float:left;margin-left:20px;}
-.span1{width:40px;}
-.span2{width:100px;}
-.span3{width:160px;}
-.span4{width:220px;}
-.span5{width:280px;}
-.span6{width:340px;}
-.span7{width:400px;}
-.span8{width:460px;}
-.span9{width:520px;}
-.span10{width:580px;}
-.span11{width:640px;}
-.span12{width:700px;}
-.span13{width:760px;}
-.span14{width:820px;}
-.span15{width:880px;}
-.span16{width:940px;}
-.span17{width:1000px;}
-.span18{width:1060px;}
-.span19{width:1120px;}
-.span20{width:1180px;}
-.span21{width:1240px;}
-.span22{width:1300px;}
-.span23{width:1360px;}
-.span24{width:1420px;}
-.row>.offset1{margin-left:80px;}
-.row>.offset2{margin-left:140px;}
-.row>.offset3{margin-left:200px;}
-.row>.offset4{margin-left:260px;}
-.row>.offset5{margin-left:320px;}
-.row>.offset6{margin-left:380px;}
-.row>.offset7{margin-left:440px;}
-.row>.offset8{margin-left:500px;}
-.row>.offset9{margin-left:560px;}
-.row>.offset10{margin-left:620px;}
-.row>.offset11{margin-left:680px;}
-.row>.offset12{margin-left:740px;}
-.span-one-third{width:300px;}
-.span-two-thirds{width:620px;}
-.row>.offset-one-third{margin-left:340px;}
-.row>.offset-two-thirds{margin-left:660px;}
-p{font-size:13px;font-weight:normal;line-height:18px;margin-bottom:9px;}p small{font-size:11px;color:#bfbfbf;}
-h1,h2,h3,h4,h5,h6{font-weight:bold;color:#404040;}h1 small,h2 small,h3 small,h4 small,h5 small,h6 small{color:#bfbfbf;}
-h1{margin-bottom:18px;font-size:30px;line-height:36px;}h1 small{font-size:18px;}
-h2{font-size:24px;line-height:36px;}h2 small{font-size:14px;}
-h3,h4,h5,h6{line-height:36px;}
-h3{font-size:18px;}h3 small{font-size:14px;}
-h4{font-size:16px;}h4 small{font-size:12px;}
-h5{font-size:14px;}
-h6{font-size:13px;color:#bfbfbf;text-transform:uppercase;}
-ul,ol{margin:0 0 18px 25px;}
-ul ul,ul ol,ol ol,ol ul{margin-bottom:0;}
-ul{list-style:disc;}
-ol{list-style:decimal;}
-li{line-height:18px;}
-ul.unstyled{list-style:none;margin-left:0;}
-dl{margin-bottom:18px;}dl dt,dl dd{line-height:18px;}
-dl dt{font-weight:bold;}
-dl dd{margin-left:9px;}
-hr{margin:20px 0 19px;border:0;border-bottom:1px solid #eee;}
-strong{font-style:inherit;font-weight:bold;}
-em{font-style:italic;font-weight:inherit;line-height:inherit;}
-.muted{color:#bfbfbf;}
-blockquote{margin-bottom:18px;border-left:5px solid #eee;padding-left:15px;}blockquote p{font-size:14px;font-weight:300;line-height:18px;margin-bottom:0;}
-blockquote small{display:block;font-size:12px;font-weight:300;line-height:18px;color:#bfbfbf;}blockquote small:before{content:'\2014 \00A0';}
-address{display:block;line-height:18px;margin-bottom:18px;}
-code,pre{padding:0 3px 2px;font-family:Monaco, Andale Mono, Courier New, monospace;font-size:12px;-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}
-code{background-color:#fee9cc;color:rgba(0, 0, 0, 0.75);padding:1px 3px;}
-pre{background-color:#f5f5f5;display:block;padding:8.5px;margin:0 0 18px;line-height:18px;font-size:12px;border:1px solid #ccc;border:1px solid rgba(0, 0, 0, 0.15);-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;white-space:pre;white-space:pre-wrap;word-wrap:break-word;}
-form{margin-bottom:18px;}
-fieldset{margin-bottom:18px;padding-top:18px;}fieldset legend{display:block;padding-left:150px;font-size:19.5px;line-height:1;color:#404040;*padding:0 0 5px 145px;*line-height:1.5;}
-form .clearfix{margin-bottom:18px;zoom:1;}form .clearfix:before,form .clearfix:after{display:table;content:"";zoom:1;}
-form .clearfix:after{clear:both;}
-label,input,select,textarea{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:13px;font-weight:normal;line-height:normal;}
-label{padding-top:6px;font-size:13px;line-height:18px;float:left;width:130px;text-align:right;color:#404040;}
-form .input{margin-left:150px;}
-input[type=checkbox],input[type=radio]{cursor:pointer;}
-input,textarea,select,.uneditable-input{display:inline-block;width:210px;height:18px;padding:4px;font-size:13px;line-height:18px;color:#808080;border:1px solid #ccc;-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}
-select{padding:initial;}
-input[type=checkbox],input[type=radio]{width:auto;height:auto;padding:0;margin:3px 0;*margin-top:0;line-height:normal;border:none;}
-input[type=file]{background-color:#ffffff;padding:initial;border:initial;line-height:initial;-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;}
-input[type=button],input[type=reset],input[type=submit]{width:auto;height:auto;}
-select,input[type=file]{height:27px;*height:auto;line-height:27px;*margin-top:4px;}
-select[multiple]{height:inherit;background-color:#ffffff;}
-textarea{height:auto;}
-.uneditable-input{background-color:#ffffff;display:block;border-color:#eee;-webkit-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.025);-moz-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.025);box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.025);cursor:not-allowed;}
-:-moz-placeholder{color:#bfbfbf;}
-::-webkit-input-placeholder{color:#bfbfbf;}
-input,textarea{-webkit-transition:border linear 0.2s,box-shadow linear 0.2s;-moz-transition:border linear 0.2s,box-shadow linear 0.2s;-ms-transition:border linear 0.2s,box-shadow linear 0.2s;-o-transition:border linear 0.2s,box-shadow linear 0.2s;transition:border linear 0.2s,box-shadow linear 0.2s;-webkit-box-shadow:inset 0 1px 3px rgba(0, 0, 0, 0.1);-moz-box-shadow:inset 0 1px 3px rgba(0, 0, 0, 0.1);box-shadow:inset 0 1px 3px rgba(0, 0, 0, 0.1);}
-input:focus,textarea:focus{outline:0;border-color:rgba(82, 168, 236, 0.8);-webkit-box-shadow:inset 0 1px 3px rgba(0, 0, 0, 0.1),0 0 8px rgba(82, 168, 236, 0.6);-moz-box-shadow:inset 0 1px 3px rgba(0, 0, 0, 0.1),0 0 8px rgba(82, 168, 236, 0.6);box-shadow:inset 0 1px 3px rgba(0, 0, 0, 0.1),0 0 8px rgba(82, 168, 236, 0.6);}
-input[type=file]:focus,input[type=checkbox]:focus,select:focus{-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;outline:1px dotted #666;}
-form .clearfix.error>label,form .clearfix.error .help-block,form .clearfix.error .help-inline{color:#b94a48;}
-form .clearfix.error input,form .clearfix.error textarea{color:#b94a48;border-color:#ee5f5b;}form .clearfix.error input:focus,form .clearfix.error textarea:focus{border-color:#e9322d;-webkit-box-shadow:0 0 6px #f8b9b7;-moz-box-shadow:0 0 6px #f8b9b7;box-shadow:0 0 6px #f8b9b7;}
-form .clearfix.error .input-prepend .add-on,form .clearfix.error .input-append .add-on{color:#b94a48;background-color:#fce6e6;border-color:#b94a48;}
-form .clearfix.warning>label,form .clearfix.warning .help-block,form .clearfix.warning .help-inline{color:#c09853;}
-form .clearfix.warning input,form .clearfix.warning textarea{color:#c09853;border-color:#ccae64;}form .clearfix.warning input:focus,form .clearfix.warning textarea:focus{border-color:#be9a3f;-webkit-box-shadow:0 0 6px #e5d6b1;-moz-box-shadow:0 0 6px #e5d6b1;box-shadow:0 0 6px #e5d6b1;}
-form .clearfix.warning .input-prepend .add-on,form .clearfix.warning .input-append .add-on{color:#c09853;background-color:#d2b877;border-color:#c09853;}
-form .clearfix.success>label,form .clearfix.success .help-block,form .clearfix.success .help-inline{color:#468847;}
-form .clearfix.success input,form .clearfix.success textarea{color:#468847;border-color:#57a957;}form .clearfix.success input:focus,form .clearfix.success textarea:focus{border-color:#458845;-webkit-box-shadow:0 0 6px #9acc9a;-moz-box-shadow:0 0 6px #9acc9a;box-shadow:0 0 6px #9acc9a;}
-form .clearfix.success .input-prepend .add-on,form .clearfix.success .input-append .add-on{color:#468847;background-color:#bcddbc;border-color:#468847;}
-.input-mini,input.mini,textarea.mini,select.mini{width:60px;}
-.input-small,input.small,textarea.small,select.small{width:90px;}
-.input-medium,input.medium,textarea.medium,select.medium{width:150px;}
-.input-large,input.large,textarea.large,select.large{width:210px;}
-.input-xlarge,input.xlarge,textarea.xlarge,select.xlarge{width:270px;}
-.input-xxlarge,input.xxlarge,textarea.xxlarge,select.xxlarge{width:530px;}
-textarea.xxlarge{overflow-y:auto;}
-input.span1,textarea.span1{display:inline-block;float:none;width:30px;margin-left:0;}
-input.span2,textarea.span2{display:inline-block;float:none;width:90px;margin-left:0;}
-input.span3,textarea.span3{display:inline-block;float:none;width:150px;margin-left:0;}
-input.span4,textarea.span4{display:inline-block;float:none;width:210px;margin-left:0;}
-input.span5,textarea.span5{display:inline-block;float:none;width:270px;margin-left:0;}
-input.span6,textarea.span6{display:inline-block;float:none;width:330px;margin-left:0;}
-input.span7,textarea.span7{display:inline-block;float:none;width:390px;margin-left:0;}
-input.span8,textarea.span8{display:inline-block;float:none;width:450px;margin-left:0;}
-input.span9,textarea.span9{display:inline-block;float:none;width:510px;margin-left:0;}
-input.span10,textarea.span10{display:inline-block;float:none;width:570px;margin-left:0;}
-input.span11,textarea.span11{display:inline-block;float:none;width:630px;margin-left:0;}
-input.span12,textarea.span12{display:inline-block;float:none;width:690px;margin-left:0;}
-input.span13,textarea.span13{display:inline-block;float:none;width:750px;margin-left:0;}
-input.span14,textarea.span14{display:inline-block;float:none;width:810px;margin-left:0;}
-input.span15,textarea.span15{display:inline-block;float:none;width:870px;margin-left:0;}
-input.span16,textarea.span16{display:inline-block;float:none;width:930px;margin-left:0;}
-input[disabled],select[disabled],textarea[disabled],input[readonly],select[readonly],textarea[readonly]{background-color:#f5f5f5;border-color:#ddd;cursor:not-allowed;}
-.actions{background:#f5f5f5;margin-top:18px;margin-bottom:18px;padding:17px 20px 18px 150px;border-top:1px solid #ddd;-webkit-border-radius:0 0 3px 3px;-moz-border-radius:0 0 3px 3px;border-radius:0 0 3px 3px;}.actions .secondary-action{float:right;}.actions .secondary-action a{line-height:30px;}.actions .secondary-action a:hover{text-decoration:underline;}
-.help-inline,.help-block{font-size:13px;line-height:18px;color:#bfbfbf;}
-.help-inline{padding-left:5px;*position:relative;*top:-5px;}
-.help-block{display:block;max-width:600px;}
-.inline-inputs{color:#808080;}.inline-inputs span{padding:0 2px 0 1px;}
-.input-prepend input,.input-append input{-webkit-border-radius:0 3px 3px 0;-moz-border-radius:0 3px 3px 0;border-radius:0 3px 3px 0;}
-.input-prepend .add-on,.input-append .add-on{position:relative;background:#f5f5f5;border:1px solid #ccc;z-index:2;float:left;display:block;width:auto;min-width:16px;height:18px;padding:4px 4px 4px 5px;margin-right:-1px;font-weight:normal;line-height:18px;color:#bfbfbf;text-align:center;text-shadow:0 1px 0 #ffffff;-webkit-border-radius:3px 0 0 3px;-moz-border-radius:3px 0 0 3px;border-radius:3px 0 0 3px;}
-.input-prepend .active,.input-append .active{background:#a9dba9;border-color:#46a546;}
-.input-prepend .add-on{*margin-top:1px;}
-.input-append input{float:left;-webkit-border-radius:3px 0 0 3px;-moz-border-radius:3px 0 0 3px;border-radius:3px 0 0 3px;}
-.input-append .add-on{-webkit-border-radius:0 3px 3px 0;-moz-border-radius:0 3px 3px 0;border-radius:0 3px 3px 0;margin-right:0;margin-left:-1px;}
-.inputs-list{margin:0 0 5px;width:100%;}.inputs-list li{display:block;padding:0;width:100%;}
-.inputs-list label{display:block;float:none;width:auto;padding:0;margin-left:20px;line-height:18px;text-align:left;white-space:normal;}.inputs-list label strong{color:#808080;}
-.inputs-list label small{font-size:11px;font-weight:normal;}
-.inputs-list .inputs-list{margin-left:25px;margin-bottom:10px;padding-top:0;}
-.inputs-list:first-child{padding-top:6px;}
-.inputs-list li+li{padding-top:2px;}
-.inputs-list input[type=radio],.inputs-list input[type=checkbox]{margin-bottom:0;margin-left:-20px;float:left;}
-.form-stacked{padding-left:20px;}.form-stacked fieldset{padding-top:9px;}
-.form-stacked legend{padding-left:0;}
-.form-stacked label{display:block;float:none;width:auto;font-weight:bold;text-align:left;line-height:20px;padding-top:0;}
-.form-stacked .clearfix{margin-bottom:9px;}.form-stacked .clearfix div.input{margin-left:0;}
-.form-stacked .inputs-list{margin-bottom:0;}.form-stacked .inputs-list li{padding-top:0;}.form-stacked .inputs-list li label{font-weight:normal;padding-top:0;}
-.form-stacked div.clearfix.error{padding-top:10px;padding-bottom:10px;padding-left:10px;margin-top:0;margin-left:-10px;}
-.form-stacked .actions{margin-left:-20px;padding-left:20px;}
-table{width:100%;margin-bottom:18px;padding:0;font-size:13px;border-collapse:collapse;}table th,table td{padding:10px 10px 9px;line-height:18px;text-align:left;}
-table th{padding-top:9px;font-weight:bold;vertical-align:middle;}
-table td{vertical-align:top;border-top:1px solid #ddd;}
-table tbody th{border-top:1px solid #ddd;vertical-align:top;}
-.condensed-table th,.condensed-table td{padding:5px 5px 4px;}
-.bordered-table{border:1px solid #ddd;border-collapse:separate;*border-collapse:collapse;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}.bordered-table th+th,.bordered-table td+td,.bordered-table th+td{border-left:1px solid #ddd;}
-.bordered-table thead tr:first-child th:first-child,.bordered-table tbody tr:first-child td:first-child{-webkit-border-radius:4px 0 0 0;-moz-border-radius:4px 0 0 0;border-radius:4px 0 0 0;}
-.bordered-table thead tr:first-child th:last-child,.bordered-table tbody tr:first-child td:last-child{-webkit-border-radius:0 4px 0 0;-moz-border-radius:0 4px 0 0;border-radius:0 4px 0 0;}
-.bordered-table tbody tr:last-child td:first-child{-webkit-border-radius:0 0 0 4px;-moz-border-radius:0 0 0 4px;border-radius:0 0 0 4px;}
-.bordered-table tbody tr:last-child td:last-child{-webkit-border-radius:0 0 4px 0;-moz-border-radius:0 0 4px 0;border-radius:0 0 4px 0;}
-table .span1{width:20px;}
-table .span2{width:60px;}
-table .span3{width:100px;}
-table .span4{width:140px;}
-table .span5{width:180px;}
-table .span6{width:220px;}
-table .span7{width:260px;}
-table .span8{width:300px;}
-table .span9{width:340px;}
-table .span10{width:380px;}
-table .span11{width:420px;}
-table .span12{width:460px;}
-table .span13{width:500px;}
-table .span14{width:540px;}
-table .span15{width:580px;}
-table .span16{width:620px;}
-.zebra-striped tbody tr:nth-child(odd) td,.zebra-striped tbody tr:nth-child(odd) th{background-color:#f9f9f9;}
-.zebra-striped tbody tr:hover td,.zebra-striped tbody tr:hover th{background-color:#f5f5f5;}
-table .header{cursor:pointer;}table .header:after{content:"";float:right;margin-top:7px;border-width:0 4px 4px;border-style:solid;border-color:#000 transparent;visibility:hidden;}
-table .headerSortUp,table .headerSortDown{background-color:rgba(141, 192, 219, 0.25);text-shadow:0 1px 1px rgba(255, 255, 255, 0.75);}
-table .header:hover:after{visibility:visible;}
-table .headerSortDown:after,table .headerSortDown:hover:after{visibility:visible;filter:alpha(opacity=60);-khtml-opacity:0.6;-moz-opacity:0.6;opacity:0.6;}
-table .headerSortUp:after{border-bottom:none;border-left:4px solid transparent;border-right:4px solid transparent;border-top:4px solid #000;visibility:visible;-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;filter:alpha(opacity=60);-khtml-opacity:0.6;-moz-opacity:0.6;opacity:0.6;}
-table .blue{color:#049cdb;border-bottom-color:#049cdb;}
-table .headerSortUp.blue,table .headerSortDown.blue{background-color:#ade6fe;}
-table .green{color:#46a546;border-bottom-color:#46a546;}
-table .headerSortUp.green,table .headerSortDown.green{background-color:#cdeacd;}
-table .red{color:#9d261d;border-bottom-color:#9d261d;}
-table .headerSortUp.red,table .headerSortDown.red{background-color:#f4c8c5;}
-table .yellow{color:#ffc40d;border-bottom-color:#ffc40d;}
-table .headerSortUp.yellow,table .headerSortDown.yellow{background-color:#fff6d9;}
-table .orange{color:#f89406;border-bottom-color:#f89406;}
-table .headerSortUp.orange,table .headerSortDown.orange{background-color:#fee9cc;}
-table .purple{color:#7a43b6;border-bottom-color:#7a43b6;}
-table .headerSortUp.purple,table .headerSortDown.purple{background-color:#e2d5f0;}
-.topbar{height:40px;position:fixed;top:0;left:0;right:0;z-index:10000;overflow:visible;}.topbar a{color:#bfbfbf;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);}
-.topbar h3 a:hover,.topbar .brand:hover,.topbar ul .active>a{background-color:#333;background-color:rgba(255, 255, 255, 0.05);color:#ffffff;text-decoration:none;}
-.topbar h3{position:relative;}
-.topbar h3 a,.topbar .brand{float:left;display:block;padding:8px 20px 12px;margin-left:-20px;color:#ffffff;font-size:20px;font-weight:200;line-height:1;}
-.topbar p{margin:0;line-height:40px;}.topbar p a:hover{background-color:transparent;color:#ffffff;}
-.topbar form{float:left;margin:5px 0 0 0;position:relative;filter:alpha(opacity=100);-khtml-opacity:1;-moz-opacity:1;opacity:1;}
-.topbar form.pull-right{float:right;}
-.topbar input{background-color:#444;background-color:rgba(255, 255, 255, 0.3);font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:normal;font-weight:13px;line-height:1;padding:4px 9px;color:#ffffff;color:rgba(255, 255, 255, 0.75);border:1px solid #111;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;-webkit-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1),0 1px 0px rgba(255, 255, 255, 0.25);-moz-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1),0 1px 0px rgba(255 [...]
-.topbar input::-webkit-input-placeholder{color:#e6e6e6;}
-.topbar input:hover{background-color:#bfbfbf;background-color:rgba(255, 255, 255, 0.5);color:#ffffff;}
-.topbar input:focus,.topbar input.focused{outline:0;background-color:#ffffff;color:#404040;text-shadow:0 1px 0 #ffffff;border:0;padding:5px 10px;-webkit-box-shadow:0 0 3px rgba(0, 0, 0, 0.15);-moz-box-shadow:0 0 3px rgba(0, 0, 0, 0.15);box-shadow:0 0 3px rgba(0, 0, 0, 0.15);}
-.topbar-inner,.topbar .fill{background-color:#222;background-color:#222222;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#333333), to(#222222));background-image:-moz-linear-gradient(top, #333333, #222222);background-image:-ms-linear-gradient(top, #333333, #222222);background-image:-webkit-gradient(linear, left top, left bottom, color-stop(0%, #333333), color-stop(100%, #222222));background-image:-webkit-linear-gradient(top, #333333, #2222 [...]
-.topbar div>ul,.nav{display:block;float:left;margin:0 10px 0 0;position:relative;left:0;}.topbar div>ul>li,.nav>li{display:block;float:left;}
-.topbar div>ul a,.nav a{display:block;float:none;padding:10px 10px 11px;line-height:19px;text-decoration:none;}.topbar div>ul a:hover,.nav a:hover{color:#ffffff;text-decoration:none;}
-.topbar div>ul .active>a,.nav .active>a{background-color:#222;background-color:rgba(0, 0, 0, 0.5);}
-.topbar div>ul.secondary-nav,.nav.secondary-nav{float:right;margin-left:10px;margin-right:0;}.topbar div>ul.secondary-nav .menu-dropdown,.nav.secondary-nav .menu-dropdown,.topbar div>ul.secondary-nav .dropdown-menu,.nav.secondary-nav .dropdown-menu{right:0;border:0;}
-.topbar div>ul a.menu:hover,.nav a.menu:hover,.topbar div>ul li.open .menu,.nav li.open .menu,.topbar div>ul .dropdown-toggle:hover,.nav .dropdown-toggle:hover,.topbar div>ul .dropdown.open .dropdown-toggle,.nav .dropdown.open .dropdown-toggle{background:#444;background:rgba(255, 255, 255, 0.05);}
-.topbar div>ul .menu-dropdown,.nav .menu-dropdown,.topbar div>ul .dropdown-menu,.nav .dropdown-menu{background-color:#333;}.topbar div>ul .menu-dropdown a.menu,.nav .menu-dropdown a.menu,.topbar div>ul .dropdown-menu a.menu,.nav .dropdown-menu a.menu,.topbar div>ul .menu-dropdown .dropdown-toggle,.nav .menu-dropdown .dropdown-toggle,.topbar div>ul .dropdown-menu .dropdown-toggle,.nav .dropdown-menu .dropdown-toggle{color:#ffffff;}.topbar div>ul .menu-dropdown a.menu.open,.nav .menu-dropd [...]
-.topbar div>ul .menu-dropdown li a,.nav .menu-dropdown li a,.topbar div>ul .dropdown-menu li a,.nav .dropdown-menu li a{color:#999;text-shadow:0 1px 0 rgba(0, 0, 0, 0.5);}.topbar div>ul .menu-dropdown li a:hover,.nav .menu-dropdown li a:hover,.topbar div>ul .dropdown-menu li a:hover,.nav .dropdown-menu li a:hover{background-color:#191919;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#292929), to(#191919));background-image:-moz-linear-grad [...]
-.topbar div>ul .menu-dropdown .active a,.nav .menu-dropdown .active a,.topbar div>ul .dropdown-menu .active a,.nav .dropdown-menu .active a{color:#ffffff;}
-.topbar div>ul .menu-dropdown .divider,.nav .menu-dropdown .divider,.topbar div>ul .dropdown-menu .divider,.nav .dropdown-menu .divider{background-color:#222;border-color:#444;}
-.topbar ul .menu-dropdown li a,.topbar ul .dropdown-menu li a{padding:4px 15px;}
-li.menu,.dropdown{position:relative;}
-a.menu:after,.dropdown-toggle:after{width:0;height:0;display:inline-block;content:"&darr;";text-indent:-99999px;vertical-align:top;margin-top:8px;margin-left:4px;border-left:4px solid transparent;border-right:4px solid transparent;border-top:4px solid #ffffff;filter:alpha(opacity=50);-khtml-opacity:0.5;-moz-opacity:0.5;opacity:0.5;}
-.menu-dropdown,.dropdown-menu{background-color:#ffffff;float:left;display:none;position:absolute;top:40px;z-index:900;min-width:160px;max-width:220px;_width:160px;margin-left:0;margin-right:0;padding:6px 0;zoom:1;border-color:#999;border-color:rgba(0, 0, 0, 0.2);border-style:solid;border-width:0 1px 1px;-webkit-border-radius:0 0 6px 6px;-moz-border-radius:0 0 6px 6px;border-radius:0 0 6px 6px;-webkit-box-shadow:0 2px 4px rgba(0, 0, 0, 0.2);-moz-box-shadow:0 2px 4px rgba(0, 0, 0, 0.2);box [...]
-.menu-dropdown .divider,.dropdown-menu .divider{height:1px;margin:5px 0;overflow:hidden;background-color:#eee;border-bottom:1px solid #ffffff;}
-.topbar .dropdown-menu a,.dropdown-menu a{display:block;padding:4px 15px;clear:both;font-weight:normal;line-height:18px;color:#808080;text-shadow:0 1px 0 #ffffff;}.topbar .dropdown-menu a:hover,.dropdown-menu a:hover,.topbar .dropdown-menu a.hover,.dropdown-menu a.hover{background-color:#dddddd;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#eeeeee), to(#dddddd));background-image:-moz-linear-gradient(top, #eeeeee, #dddddd);background-image [...]
-.open .menu,.dropdown.open .menu,.open .dropdown-toggle,.dropdown.open .dropdown-toggle{color:#ffffff;background:#ccc;background:rgba(0, 0, 0, 0.3);}
-.open .menu-dropdown,.dropdown.open .menu-dropdown,.open .dropdown-menu,.dropdown.open .dropdown-menu{display:block;}
-.tabs,.pills{margin:0 0 18px;padding:0;list-style:none;zoom:1;}.tabs:before,.pills:before,.tabs:after,.pills:after{display:table;content:"";zoom:1;}
-.tabs:after,.pills:after{clear:both;}
-.tabs>li,.pills>li{float:left;}.tabs>li>a,.pills>li>a{display:block;}
-.tabs{border-color:#ddd;border-style:solid;border-width:0 0 1px;}.tabs>li{position:relative;margin-bottom:-1px;}.tabs>li>a{padding:0 15px;margin-right:2px;line-height:34px;border:1px solid transparent;-webkit-border-radius:4px 4px 0 0;-moz-border-radius:4px 4px 0 0;border-radius:4px 4px 0 0;}.tabs>li>a:hover{text-decoration:none;background-color:#eee;border-color:#eee #eee #ddd;}
-.tabs .active>a,.tabs .active>a:hover{color:#808080;background-color:#ffffff;border:1px solid #ddd;border-bottom-color:transparent;cursor:default;}
-.tabs .menu-dropdown,.tabs .dropdown-menu{top:35px;border-width:1px;-webkit-border-radius:0 6px 6px 6px;-moz-border-radius:0 6px 6px 6px;border-radius:0 6px 6px 6px;}
-.tabs a.menu:after,.tabs .dropdown-toggle:after{border-top-color:#999;margin-top:15px;margin-left:5px;}
-.tabs li.open.menu .menu,.tabs .open.dropdown .dropdown-toggle{border-color:#999;}
-.tabs li.open a.menu:after,.tabs .dropdown.open .dropdown-toggle:after{border-top-color:#555;}
-.pills a{margin:5px 3px 5px 0;padding:0 15px;line-height:30px;text-shadow:0 1px 1px #ffffff;-webkit-border-radius:15px;-moz-border-radius:15px;border-radius:15px;}.pills a:hover{color:#ffffff;text-decoration:none;text-shadow:0 1px 1px rgba(0, 0, 0, 0.25);background-color:#00438a;}
-.pills .active a{color:#ffffff;text-shadow:0 1px 1px rgba(0, 0, 0, 0.25);background-color:#0069d6;}
-.pills-vertical>li{float:none;}
-.tab-content>.tab-pane,.pill-content>.pill-pane,.tab-content>div,.pill-content>div{display:none;}
-.tab-content>.active,.pill-content>.active{display:block;}
-.breadcrumb{padding:7px 14px;margin:0 0 18px;background-color:#f5f5f5;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#ffffff), to(#f5f5f5));background-image:-moz-linear-gradient(top, #ffffff, #f5f5f5);background-image:-ms-linear-gradient(top, #ffffff, #f5f5f5);background-image:-webkit-gradient(linear, left top, left bottom, color-stop(0%, #ffffff), color-stop(100%, #f5f5f5));background-image:-webkit-linear-gradient(top, #ffffff, #f5f5f5);b [...]
-.breadcrumb .divider{padding:0 5px;color:#bfbfbf;}
-.breadcrumb .active a{color:#404040;}
-.hero-unit{background-color:#f5f5f5;margin-bottom:30px;padding:60px;-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;}.hero-unit h1{margin-bottom:0;font-size:60px;line-height:1;letter-spacing:-1px;}
-.hero-unit p{font-size:18px;font-weight:200;line-height:27px;}
-footer{margin-top:17px;padding-top:17px;border-top:1px solid #eee;}
-.page-header{margin-bottom:17px;border-bottom:1px solid #ddd;-webkit-box-shadow:0 1px 0 rgba(255, 255, 255, 0.5);-moz-box-shadow:0 1px 0 rgba(255, 255, 255, 0.5);box-shadow:0 1px 0 rgba(255, 255, 255, 0.5);}.page-header h1{margin-bottom:8px;}
-.btn.danger,.alert-message.danger,.btn.danger:hover,.alert-message.danger:hover,.btn.error,.alert-message.error,.btn.error:hover,.alert-message.error:hover,.btn.success,.alert-message.success,.btn.success:hover,.alert-message.success:hover,.btn.info,.alert-message.info,.btn.info:hover,.alert-message.info:hover{color:#ffffff;}
-.btn .close,.alert-message .close{font-family:Arial,sans-serif;line-height:18px;}
-.btn.danger,.alert-message.danger,.btn.error,.alert-message.error{background-color:#c43c35;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#ee5f5b), to(#c43c35));background-image:-moz-linear-gradient(top, #ee5f5b, #c43c35);background-image:-ms-linear-gradient(top, #ee5f5b, #c43c35);background-image:-webkit-gradient(linear, left top, left bottom, color-stop(0%, #ee5f5b), color-stop(100%, #c43c35));background-image:-webkit-linear-gradient(top [...]
-.btn.success,.alert-message.success{background-color:#57a957;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#62c462), to(#57a957));background-image:-moz-linear-gradient(top, #62c462, #57a957);background-image:-ms-linear-gradient(top, #62c462, #57a957);background-image:-webkit-gradient(linear, left top, left bottom, color-stop(0%, #62c462), color-stop(100%, #57a957));background-image:-webkit-linear-gradient(top, #62c462, #57a957);background [...]
-.btn.info,.alert-message.info{background-color:#339bb9;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#5bc0de), to(#339bb9));background-image:-moz-linear-gradient(top, #5bc0de, #339bb9);background-image:-ms-linear-gradient(top, #5bc0de, #339bb9);background-image:-webkit-gradient(linear, left top, left bottom, color-stop(0%, #5bc0de), color-stop(100%, #339bb9));background-image:-webkit-linear-gradient(top, #5bc0de, #339bb9);background-image [...]
-.btn{cursor:pointer;display:inline-block;background-color:#e6e6e6;background-repeat:no-repeat;background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#ffffff), color-stop(25%, #ffffff), to(#e6e6e6));background-image:-webkit-linear-gradient(#ffffff, #ffffff 25%, #e6e6e6);background-image:-moz-linear-gradient(top, #ffffff, #ffffff 25%, #e6e6e6);background-image:-ms-linear-gradient(#ffffff, #ffffff 25%, #e6e6e6);background-image:-o-linear-gradient(#ffffff, #ffffff 25%, #e6e6e6);backgrou [...]
-.btn:focus{outline:1px dotted #666;}
-.btn.primary{color:#ffffff;background-color:#0064cd;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#049cdb), to(#0064cd));background-image:-moz-linear-gradient(top, #049cdb, #0064cd);background-image:-ms-linear-gradient(top, #049cdb, #0064cd);background-image:-webkit-gradient(linear, left top, left bottom, color-stop(0%, #049cdb), color-stop(100%, #0064cd));background-image:-webkit-linear-gradient(top, #049cdb, #0064cd);background-image:-o [...]
-.btn.active,.btn:active{-webkit-box-shadow:inset 0 2px 4px rgba(0, 0, 0, 0.25),0 1px 2px rgba(0, 0, 0, 0.05);-moz-box-shadow:inset 0 2px 4px rgba(0, 0, 0, 0.25),0 1px 2px rgba(0, 0, 0, 0.05);box-shadow:inset 0 2px 4px rgba(0, 0, 0, 0.25),0 1px 2px rgba(0, 0, 0, 0.05);}
-.btn.disabled{cursor:default;background-image:none;filter:progid:DXImageTransform.Microsoft.gradient(enabled = false);filter:alpha(opacity=65);-khtml-opacity:0.65;-moz-opacity:0.65;opacity:0.65;-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;}
-.btn[disabled]{cursor:default;background-image:none;filter:progid:DXImageTransform.Microsoft.gradient(enabled = false);filter:alpha(opacity=65);-khtml-opacity:0.65;-moz-opacity:0.65;opacity:0.65;-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;}
-.btn.large{font-size:15px;line-height:normal;padding:9px 14px 9px;-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;}
-.btn.small{padding:7px 9px 7px;font-size:11px;}
-:root .alert-message,:root .btn{border-radius:0 \0;}
-button.btn::-moz-focus-inner,input[type=submit].btn::-moz-focus-inner{padding:0;border:0;}
-.close{float:right;color:#000000;font-size:20px;font-weight:bold;line-height:13.5px;text-shadow:0 1px 0 #ffffff;filter:alpha(opacity=25);-khtml-opacity:0.25;-moz-opacity:0.25;opacity:0.25;}.close:hover{color:#000000;text-decoration:none;filter:alpha(opacity=40);-khtml-opacity:0.4;-moz-opacity:0.4;opacity:0.4;}
-.alert-message{position:relative;padding:7px 15px;margin-bottom:18px;color:#404040;background-color:#eedc94;background-repeat:repeat-x;background-image:-khtml-gradient(linear, left top, left bottom, from(#fceec1), to(#eedc94));background-image:-moz-linear-gradient(top, #fceec1, #eedc94);background-image:-ms-linear-gradient(top, #fceec1, #eedc94);background-image:-webkit-gradient(linear, left top, left bottom, color-stop(0%, #fceec1), color-stop(100%, #eedc94));background-image:-webkit-li [...]
-.alert-message a{font-weight:bold;color:#404040;}
-.alert-message.danger p a,.alert-message.error p a,.alert-message.success p a,.alert-message.info p a{color:#ffffff;}
-.alert-message h5{line-height:18px;}
-.alert-message p{margin-bottom:0;}
-.alert-message div{margin-top:5px;margin-bottom:2px;line-height:28px;}
-.alert-message .btn{-webkit-box-shadow:0 1px 0 rgba(255, 255, 255, 0.25);-moz-box-shadow:0 1px 0 rgba(255, 255, 255, 0.25);box-shadow:0 1px 0 rgba(255, 255, 255, 0.25);}
-.alert-message.block-message{background-image:none;background-color:#fdf5d9;filter:progid:DXImageTransform.Microsoft.gradient(enabled = false);padding:14px;border-color:#fceec1;-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none;}.alert-message.block-message ul,.alert-message.block-message p{margin-right:30px;}
-.alert-message.block-message ul{margin-bottom:0;}
-.alert-message.block-message li{color:#404040;}
-.alert-message.block-message .alert-actions{margin-top:5px;}
-.alert-message.block-message.error,.alert-message.block-message.success,.alert-message.block-message.info{color:#404040;text-shadow:0 1px 0 rgba(255, 255, 255, 0.5);}
-.alert-message.block-message.error{background-color:#fddfde;border-color:#fbc7c6;}
-.alert-message.block-message.success{background-color:#d1eed1;border-color:#bfe7bf;}
-.alert-message.block-message.info{background-color:#ddf4fb;border-color:#c6edf9;}
-.alert-message.block-message.danger p a,.alert-message.block-message.error p a,.alert-message.block-message.success p a,.alert-message.block-message.info p a{color:#404040;}
-.pagination{height:36px;margin:18px 0;}.pagination ul{float:left;margin:0;border:1px solid #ddd;border:1px solid rgba(0, 0, 0, 0.15);-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;-webkit-box-shadow:0 1px 2px rgba(0, 0, 0, 0.05);-moz-box-shadow:0 1px 2px rgba(0, 0, 0, 0.05);box-shadow:0 1px 2px rgba(0, 0, 0, 0.05);}
-.pagination li{display:inline;}
-.pagination a{float:left;padding:0 14px;line-height:34px;border-right:1px solid;border-right-color:#ddd;border-right-color:rgba(0, 0, 0, 0.15);*border-right-color:#ddd;text-decoration:none;}
-.pagination a:hover,.pagination .active a{background-color:#c7eefe;}
-.pagination .disabled a,.pagination .disabled a:hover{background-color:transparent;color:#bfbfbf;}
-.pagination .next a{border:0;}
-.well{background-color:#f5f5f5;margin-bottom:20px;padding:19px;min-height:20px;border:1px solid #eee;border:1px solid rgba(0, 0, 0, 0.05);-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;-webkit-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.05);-moz-box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.05);box-shadow:inset 0 1px 1px rgba(0, 0, 0, 0.05);}.well blockquote{border-color:#ddd;border-color:rgba(0, 0, 0, 0.15);}
-.modal-backdrop{background-color:#000000;position:fixed;top:0;left:0;right:0;bottom:0;z-index:10000;}.modal-backdrop.fade{opacity:0;}
-.modal-backdrop,.modal-backdrop.fade.in{filter:alpha(opacity=80);-khtml-opacity:0.8;-moz-opacity:0.8;opacity:0.8;}
-.modal{position:fixed;top:50%;left:50%;z-index:11000;width:560px;margin:-250px 0 0 -280px;background-color:#ffffff;border:1px solid #999;border:1px solid rgba(0, 0, 0, 0.3);*border:1px solid #999;-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;-webkit-box-shadow:0 3px 7px rgba(0, 0, 0, 0.3);-moz-box-shadow:0 3px 7px rgba(0, 0, 0, 0.3);box-shadow:0 3px 7px rgba(0, 0, 0, 0.3);-webkit-background-clip:padding-box;-moz-background-clip:padding-box;background-clip:padding-box [...]
-.modal.fade{-webkit-transition:opacity .3s linear, top .3s ease-out;-moz-transition:opacity .3s linear, top .3s ease-out;-ms-transition:opacity .3s linear, top .3s ease-out;-o-transition:opacity .3s linear, top .3s ease-out;transition:opacity .3s linear, top .3s ease-out;top:-25%;}
-.modal.fade.in{top:50%;}
-.modal-header{border-bottom:1px solid #eee;padding:5px 15px;}
-.modal-body{padding:15px;}
-.modal-body form{margin-bottom:0;}
-.modal-footer{background-color:#f5f5f5;padding:14px 15px 15px;border-top:1px solid #ddd;-webkit-border-radius:0 0 6px 6px;-moz-border-radius:0 0 6px 6px;border-radius:0 0 6px 6px;-webkit-box-shadow:inset 0 1px 0 #ffffff;-moz-box-shadow:inset 0 1px 0 #ffffff;box-shadow:inset 0 1px 0 #ffffff;zoom:1;margin-bottom:0;}.modal-footer:before,.modal-footer:after{display:table;content:"";zoom:1;}
-.modal-footer:after{clear:both;}
-.modal-footer .btn{float:right;margin-left:5px;}
-.modal .popover,.modal .twipsy{z-index:12000;}
-.twipsy{display:block;position:absolute;visibility:visible;padding:5px;font-size:11px;z-index:1000;filter:alpha(opacity=80);-khtml-opacity:0.8;-moz-opacity:0.8;opacity:0.8;}.twipsy.fade.in{filter:alpha(opacity=80);-khtml-opacity:0.8;-moz-opacity:0.8;opacity:0.8;}
-.twipsy.above .twipsy-arrow{bottom:0;left:50%;margin-left:-5px;border-left:5px solid transparent;border-right:5px solid transparent;border-top:5px solid #000000;}
-.twipsy.left .twipsy-arrow{top:50%;right:0;margin-top:-5px;border-top:5px solid transparent;border-bottom:5px solid transparent;border-left:5px solid #000000;}
-.twipsy.below .twipsy-arrow{top:0;left:50%;margin-left:-5px;border-left:5px solid transparent;border-right:5px solid transparent;border-bottom:5px solid #000000;}
-.twipsy.right .twipsy-arrow{top:50%;left:0;margin-top:-5px;border-top:5px solid transparent;border-bottom:5px solid transparent;border-right:5px solid #000000;}
-.twipsy-inner{padding:3px 8px;background-color:#000000;color:white;text-align:center;max-width:200px;text-decoration:none;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
-.twipsy-arrow{position:absolute;width:0;height:0;}
-.popover{position:absolute;top:0;left:0;z-index:1000;padding:5px;display:none;}.popover.above .arrow{bottom:0;left:50%;margin-left:-5px;border-left:5px solid transparent;border-right:5px solid transparent;border-top:5px solid #000000;}
-.popover.right .arrow{top:50%;left:0;margin-top:-5px;border-top:5px solid transparent;border-bottom:5px solid transparent;border-right:5px solid #000000;}
-.popover.below .arrow{top:0;left:50%;margin-left:-5px;border-left:5px solid transparent;border-right:5px solid transparent;border-bottom:5px solid #000000;}
-.popover.left .arrow{top:50%;right:0;margin-top:-5px;border-top:5px solid transparent;border-bottom:5px solid transparent;border-left:5px solid #000000;}
-.popover .arrow{position:absolute;width:0;height:0;}
-.popover .inner{background:#000000;background:rgba(0, 0, 0, 0.8);padding:3px;overflow:hidden;width:280px;-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px;-webkit-box-shadow:0 3px 7px rgba(0, 0, 0, 0.3);-moz-box-shadow:0 3px 7px rgba(0, 0, 0, 0.3);box-shadow:0 3px 7px rgba(0, 0, 0, 0.3);}
-.popover .title{background-color:#f5f5f5;padding:9px 15px;line-height:1;-webkit-border-radius:3px 3px 0 0;-moz-border-radius:3px 3px 0 0;border-radius:3px 3px 0 0;border-bottom:1px solid #eee;}
-.popover .content{background-color:#ffffff;padding:14px;-webkit-border-radius:0 0 3px 3px;-moz-border-radius:0 0 3px 3px;border-radius:0 0 3px 3px;-webkit-background-clip:padding-box;-moz-background-clip:padding-box;background-clip:padding-box;}.popover .content p,.popover .content ul,.popover .content ol{margin-bottom:0;}
-.fade{-webkit-transition:opacity 0.15s linear;-moz-transition:opacity 0.15s linear;-ms-transition:opacity 0.15s linear;-o-transition:opacity 0.15s linear;transition:opacity 0.15s linear;opacity:0;}.fade.in{opacity:1;}
-.label{padding:1px 3px 2px;font-size:9.75px;font-weight:bold;color:#ffffff;text-transform:uppercase;white-space:nowrap;background-color:#bfbfbf;-webkit-border-radius:3px;-moz-border-radius:3px;border-radius:3px;}.label.important{background-color:#c43c35;}
-.label.warning{background-color:#f89406;}
-.label.success{background-color:#46a546;}
-.label.notice{background-color:#62cffc;}
-.media-grid{margin-left:-20px;margin-bottom:0;zoom:1;}.media-grid:before,.media-grid:after{display:table;content:"";zoom:1;}
-.media-grid:after{clear:both;}
-.media-grid li{display:inline;}
-.media-grid a{float:left;padding:4px;margin:0 0 18px 20px;border:1px solid #ddd;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;-webkit-box-shadow:0 1px 1px rgba(0, 0, 0, 0.075);-moz-box-shadow:0 1px 1px rgba(0, 0, 0, 0.075);box-shadow:0 1px 1px rgba(0, 0, 0, 0.075);}.media-grid a img{display:block;}
-.media-grid a:hover{border-color:#0069d6;-webkit-box-shadow:0 1px 4px rgba(0, 105, 214, 0.25);-moz-box-shadow:0 1px 4px rgba(0, 105, 214, 0.25);box-shadow:0 1px 4px rgba(0, 105, 214, 0.25);}
\ No newline at end of file
diff --git a/_site/assets/themes/custom-twitter/css/style.css b/_site/assets/themes/custom-twitter/css/style.css
deleted file mode 100644
index 779b77f..0000000
--- a/_site/assets/themes/custom-twitter/css/style.css
+++ /dev/null
@@ -1,69 +0,0 @@
-/* Override some defaults */
-html, body {
-  background-color: #eee;
-}
-body {
-  padding-top: 40px; /* 40px to make the container go all the way to the bottom of the topbar */
-}
-.container > footer p {
-  text-align: center; /* center align it with the container */
-}
-.container {
-  width: 820px; /* downsize our container to make the content feel a bit tighter and more cohesive. NOTE: this removes two full columns from the grid, meaning you only go to 14 columns and not 16. */
-}
-
-/* The white background content wrapper */
-.content {
-  background-color: #fff;
-  padding: 20px;
-  margin: 0 -20px; /* negative indent the amount of the padding to maintain the grid system */
-  -webkit-border-radius: 0 0 6px 6px;
-     -moz-border-radius: 0 0 6px 6px;
-          border-radius: 0 0 6px 6px;
-  -webkit-box-shadow: 0 1px 2px rgba(0,0,0,.15);
-     -moz-box-shadow: 0 1px 2px rgba(0,0,0,.15);
-          box-shadow: 0 1px 2px rgba(0,0,0,.15);
-}
-
-/* Page header tweaks */
-.page-header {
-  background-color: #f5f5f5;
-  padding: 20px 20px 10px;
-  margin: -20px -20px 20px;
-}
-
-.topbar .btn {
-  border: 0;
-}
-
-/* tag_box ======================================================== */
-
-.tag_box {
-	list-style:none;
-	margin:0;
-	padding:5px 0 ;
-	overflow:hidden;
-}
-.tag_box li {
-	line-height:28px;
-}
-.tag_box.inline li {
-	float:left;
-}
-.tag_box a {
-	padding: 3px 6px;
-	margin: 2px;
-	background: #eee;
-	color:#005F6B;
-	border-radius: 3px;
-	text-decoration:none;
-}
-.tag_box a span{
-	vertical-align:super;
-	font-size:0.8em;
-}
-.tag_box a.active {
-	background:#57A957;
-	border:1px solid #4C964D;
-	color:#FFF;
-}
diff --git a/_site/assignments.html b/_site/assignments.html
deleted file mode 100644
index 0bc1ff5..0000000
--- a/_site/assignments.html
+++ /dev/null
@@ -1,142 +0,0 @@
-
-<!DOCTYPE html>
-<html lang="en">
-  <head>
-    <meta charset="utf-8">
-    <title>Assignments</title>
-    <meta name="description" content="">
-    <meta name="author" content="cse599">
-
-    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
-    <!--[if lt IE 9]>
-      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
-    <![endif]-->
-
-    <!-- Le styles -->
-    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
-    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
-
-    <!-- Le fav and touch icons -->
-  <!-- Update these with your own images
-    <link rel="shortcut icon" href="images/favicon.ico">
-    <link rel="apple-touch-icon" href="images/apple-touch-icon.png">
-    <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png">
-    <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png">
-  -->
-  <script>
-    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-    ga('create', 'UA-75982049-1', 'auto');
-    ga('send', 'pageview');
-  </script>
-  </head>
-
-  <body>
-
-    <div class="topbar">
-      <div class="fill">
-        <div class="container">
-          <a class="brand" href="">CSE599 Deep Learning System</a>
-          <ul class="nav">
-            
-            
-            
-
-
-
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      	
-      	<li class="active"><a href="/assignments" class="active">Assignments</a></li>
-      	
-      
-      
-    
-  
-    
-      
-      	
-      	<li><a href="/schedule">Schedule</a></li>
-      	
-      
-      
-    
-  
-
-
-
-
-          </ul>
-        </div>
-      </div>
-    </div>
-
-    <div class="container">
-
-      <div class="content">
-        
-<div class="page-header">
-  <h1>Assignments </h1>
-</div>
-
-<div class="row">
-  <div class="span14">
-    <p>Page for assignments.</p>
-
-  </div>
-</div>
-
-
-      </div>
-
-      <footer>
-      </footer>
-
-    </div> <!-- /container -->
-
-    
-
-
-
-  </body>
-</html>
-
diff --git a/_site/atom.xml b/_site/atom.xml
deleted file mode 100644
index 2a3dd69..0000000
--- a/_site/atom.xml
+++ /dev/null
@@ -1,16 +0,0 @@
-<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom">
- 
- <title>CSE599 Deep Learning System</title>
- <link href="dlsys-course.github.io/atom.xml" rel="self"/>
- <link href="dlsys-course.github.io"/>
- <updated>2017-03-08T11:27:04-08:00</updated>
- <id>dlsys-course.github.io</id>
- <author>
-   <name>cse599</name>
-   <email></email>
- </author>
-
- 
- 
-</feed>
diff --git a/_site/categories.html b/_site/categories.html
deleted file mode 100644
index d17623f..0000000
--- a/_site/categories.html
+++ /dev/null
@@ -1,157 +0,0 @@
-
-<!DOCTYPE html>
-<html lang="en">
-  <head>
-    <meta charset="utf-8">
-    <title>Categories</title>
-    
-    <meta name="author" content="cse599">
-
-    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
-    <!--[if lt IE 9]>
-      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
-    <![endif]-->
-
-    <!-- Le styles -->
-    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
-    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
-
-    <!-- Le fav and touch icons -->
-  <!-- Update these with your own images
-    <link rel="shortcut icon" href="images/favicon.ico">
-    <link rel="apple-touch-icon" href="images/apple-touch-icon.png">
-    <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png">
-    <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png">
-  -->
-  <script>
-    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-    ga('create', 'UA-75982049-1', 'auto');
-    ga('send', 'pageview');
-  </script>
-  </head>
-
-  <body>
-
-    <div class="topbar">
-      <div class="fill">
-        <div class="container">
-          <a class="brand" href="">CSE599 Deep Learning System</a>
-          <ul class="nav">
-            
-            
-            
-
-
-
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      	
-      	<li><a href="/assignments">Assignments</a></li>
-      	
-      
-      
-    
-  
-    
-      
-      	
-      	<li><a href="/schedule">Schedule</a></li>
-      	
-      
-      
-    
-  
-
-
-
-
-          </ul>
-        </div>
-      </div>
-    </div>
-
-    <div class="container">
-
-      <div class="content">
-        
-<div class="page-header">
-  <h1>Categories </h1>
-</div>
-
-<div class="row">
-  <div class="span14">
-    
-
-<ul class="tag_box inline">
-  
-  
-
-
-  
-    
-  
-
-
-</ul>
-
-
-
-
-  </div>
-</div>
-
-
-      </div>
-
-      <footer>
-      </footer>
-
-    </div> <!-- /container -->
-
-    
-
-
-
-  </body>
-</html>
-
diff --git a/_site/index.html b/_site/index.html
deleted file mode 100644
index a1280d6..0000000
--- a/_site/index.html
+++ /dev/null
@@ -1,178 +0,0 @@
-
-<!DOCTYPE html>
-<html lang="en">
-  <head>
-    <meta charset="utf-8">
-    <title>CS599</title>
-    <meta name="description" content="DLSys Course UW">
-    <meta name="author" content="cse599">
-
-    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
-    <!--[if lt IE 9]>
-      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
-    <![endif]-->
-
-    <!-- Le styles -->
-    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
-    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
-
-    <!-- Le fav and touch icons -->
-  <!-- Update these with your own images
-    <link rel="shortcut icon" href="images/favicon.ico">
-    <link rel="apple-touch-icon" href="images/apple-touch-icon.png">
-    <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png">
-    <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png">
-  -->
-  <script>
-    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-    ga('create', 'UA-75982049-1', 'auto');
-    ga('send', 'pageview');
-  </script>
-  </head>
-
-  <body>
-
-    <div class="topbar">
-      <div class="fill">
-        <div class="container">
-          <a class="brand" href="">CSE599 Deep Learning System</a>
-          <ul class="nav">
-            
-            
-            
-
-
-
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      
-    
-  
-    
-      
-      	
-      	<li><a href="/assignments">Assignments</a></li>
-      	
-      
-      
-    
-  
-    
-      
-      	
-      	<li><a href="/schedule">Schedule</a></li>
-      	
-      
-      
-    
-  
-
-
-
-
-          </ul>
-        </div>
-      </div>
-    </div>
-
-    <div class="container">
-
-      <div class="content">
-        
-
-<div class="row">
-  <div class="span14">
-    <h1 id="course-information">Course Information</h1>
-
-<p>Over the past few years, deep learning has become an important technique to successfully solve problems in many different fields, such as vision, NLP, robotics. An important ingredient that is driving this success is the development of deep learning systems that efficiently support the task of learning and inference of complicated models using many devices and possibly using distributed resources. The study of how to build and optimize these deep learning systems is now an active area [...]
-
-<p>This course is designed to fill this gap.  We will be covering various aspects of deep learning systems, including: basics of deep learning, programming models for expressing machine learning models, automatic differentiation, memory optimization, scheduling, distributed learning, hardware acceleration, domain specific languages, and model serving. Many of these topics intersect with existing research directions in databases, systems and networking, architecture and programming langua [...]
-
-<p>We will have two classes per week. Each week will have one lecture and another lab/discussion session.
-Each lecture will study a specific aspect of deep learning systems. The lab/discussion session will contain tutorials to implement that specific aspect
-and will include case studies of existing systems, such as Tensorflow, Caffe, Mxnet, PyTorch, and others.</p>
-
-<h2 id="instructors">Instructors</h2>
-
-<ul>
-<li><a href="http://homes.cs.washington.edu/%7Etqchen/">Tianqi Chen</a></li>
-<li><a href="http://homes.cs.washington.edu/%7Ehaichen/">Haichen Shen</a></li>
-<li><a href="http://www.cs.washington.edu/people/faculty/arvind">Arvind Krishnamurthy</a></li>
-</ul>
-
-<h2 id="prerequisites">Prerequisites</h2>
-
-<ul>
-<li>Proficiency in Python, familar in C/C++
-
-<ul>
-<li>We will mainly be using python for case study the existing systems,
-and C/C++ for some of the background hacking.</li>
-</ul></li>
-<li>A Machine Learning course, CSE546</li>
-<li>Prior knowledge in system (operation system/database) is useful but not required.</li>
-</ul>
-
-<h2 id="homeworks-and-grading">Homeworks and Grading</h2>
-
-<p>We will have two assignments and one final project.</p>
-
-<ul>
-<li>Course project: 60%</li>
-<li>Homeworks: 30%</li>
-<li>Discussion participation: 10%</li>
-</ul>
-
-  </div>
-</div>
-
-
-      </div>
-
-      <footer>
-      </footer>
-
-    </div> <!-- /container -->
-
-    
-
-
-
-  </body>
-</html>
-
diff --git a/_site/rss.xml b/_site/rss.xml
deleted file mode 100644
index f2c92bf..0000000
--- a/_site/rss.xml
+++ /dev/null
@@ -1,15 +0,0 @@
-<?xml version="1.0" encoding="UTF-8" ?>
-<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
-<channel>
-        <title>CSE599 Deep Learning System</title>
-        <description>CSE599 Deep Learning System - cse599</description>
-        <link>dlsys-course.github.io</link>
-        <atom:link href="dlsys-course.github.io/rss.xml" rel="self" type="application/rss+xml" />
-        <lastBuildDate>Wed, 08 Mar 2017 11:27:04 -0800</lastBuildDate>
-        <pubDate>Wed, 08 Mar 2017 11:27:04 -0800</pubDate>
-        <ttl>60</ttl>
-
-
-
-</channel>
-</rss>
diff --git a/_site/schedule.html b/_site/schedule.html
deleted file mode 100644
index 4ddfe75..0000000
--- a/_site/schedule.html
+++ /dev/null
@@ -1,235 +0,0 @@
-
-<!DOCTYPE html>
-<html lang="en">
-  <head>
-    <meta charset="utf-8">
-    <title>Schedule</title>
-    <meta name="description" content="">
-    <meta name="author" content="cse599">
-
-    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
-    <!--[if lt IE 9]>
-      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
-    <![endif]-->
-
-    <!-- Le styles -->
-    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
-    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
-
-    <!-- Le fav and touch icons -->
-  <!-- Update these with your own images
-    <link rel="shortcut icon" href="images/favicon.ico">
-    <link rel="apple-touch-icon" href="images/apple-touch-icon.png">
-    <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png">
-    <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png">
-  -->
-  <script>
-    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-    ga('create', 'UA-75982049-1', 'auto');
-    ga('send', 'pageview');
-  </script>
-  </head>
-
-  <body>
-
-    <div class="topbar">
-      <div class="fill">
-        <div class="container">
-          <a class="brand" href="">CSE599 Deep Learning System</a>
-          <ul class="nav">
... 4707 lines suppressed ...