You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by ho...@apache.org on 2023/09/05 21:12:04 UTC

[tvm-vta] branch disco-integration deleted (was 0a75b2c)

This is an automated email from the ASF dual-hosted git repository.

hongyij pushed a change to branch disco-integration
in repository https://gitbox.apache.org/repos/asf/tvm-vta.git


     was 0a75b2c  get vocab size from config

This change permanently discards the following revisions:

 discard 0a75b2c  get vocab size from config
 discard 21251ea  disco+mlc on single machine
 discard 2bd2539  Change printing in chat module to logging info (#818)
 discard dc36bdd  [Doc] Fix typo,--model not --local-id now (#822)
 discard 0858630  Update try_out.rst (changed --local-id param to --model and moved notes about drivers. (#817)
 discard 8649d4f  [Minor] Remove PyTest Dependencies by default (#814)
 discard f36f592  [Model Support] CodeLlama (#809)
 discard 5fe6344  [Quantization] AutoGPTQ refactor and matmul combination support (#694)
 discard 94bda91  Disable decoding for system prompts (#807)
 discard c32dbda  Update CLI to be more consistent with ChatModule (#789)
 discard 7c135b8  Optionaly use max_sequence_length in config for split rotary fusion (#801)
 discard 3a10dbb  [CMAKE] Add check for rust installed (#799)
 discard 66550e0  [FIX] Fix error `max_seq_len == -1` (#797)
 discard e704c8b  [Doc] Minor update to `Build Android Package from Source` section (#785)
 discard d4ca67e  added cors to fast api (#757)
 discard 4127782  Update Llama2 cached sin/cos to use max_sequence_length (#780)
 discard d8eadc1  Update gpu.rst to add sudo apt update before first install (#784)
 discard 796d3fd  [Doc] Update doc for prebuilt models (#767)
 discard fbce5a3  Improve code completion experience (#772)
 discard ab40434  Automatically set 'offset' parameter if 'messages' parameter is set (#754)
 discard 0735f6c  Update tokenizers-cpp to latest and fix rust build error (#762)
 discard abac1a3  [Utils] Skip generating benchmark scripts in cases (#759)
 discard e9579e4  [Android] Add libOpenCL-pixel for supporintg Pixel phones. (#723)
 discard 976d9f3  [Doc] Add the Mali-OpenCL setup instruction (#749)
 discard a39e0d7  Refactor MLC Chat iOS App (#746)
 discard 3d46654  [Docs] Add doc for build api and BuildArgs (#610)
 discard 0637ef6  Added missing install command (#745)
 discard d59185a  Add conversation template support for Wizard models (#741)
 discard e39d773  Extend `gen_cmake_config.py` for TVM_HOME, CUTLASS, CUBLAS (#743)
 discard 94e0109  Add PrimFunc Benchmark Script Dump in Debug Dump Mode (#738)
 discard 3c7bac5  [Mali] Add CLI support for Mali device (#734)
 discard 282f87b  Add `q8f16_1` for benchmark (#736)
 discard 5ac98e6  Fix OPENCL variable in tvm.rst (#735)
 discard 47e9297  [Mali] Add Mali Device Tag (#733)
 discard 88e6d3e  [CLI] Device parsing in `"device:id"` style (#728)
 discard bfba99b  Auto updated submodule references
 discard 0fb027f  [RWKV] Improve RWKV (#615)
 discard 5d6d47f  [BugFix] Fix extra ChatConfig args crash (#722)
 discard 529bfcb  [Bug fix] Small fix on RestAPIArgs (#724)
 discard 5b3fa31  Add cublas offload (#695)
 discard 5f5ca60  Update wording (#717)
 discard ae76577  [Fix] Disable CUDA multiarch fatbin by default (#716)
 discard f177154  [Gradio] Update in accordance with new ChatModule (#711)
 discard c9ea80d  [Docs] Add StreamToStdout to Python API examples (#707)
 discard a4efbf9  Update REST API docs to reflect new args (#708)
 discard e744f23  [Examples] Use stdout stream for Python API examples (#705)
 discard 64fdc15  [Docs][Python] Update `benchmark_generate` docs (#704)
 discard 514d8ab  [CMAKE] Update gen_cmake_config.py to use ROCm (#703)
 discard 8c4fec0  [Docs] Link the Python API notebook to Python API try-out (#702)
 discard 054ddfd  [Fix][Python] Reset chat before benchmark warmup (#701)
 discard ddc844c  [Docs] Try-out instructions for Python API (#700)
 discard 79582ef  [Docs] Python API finer documentation (#687)
 discard 2cb5956  Update ChatModule usage in REST API (#698)
 discard e068f88  [Python API] high-level function update (#696)
 discard 750d4ea  [ChatModule] Raw text generation and benchmark (#697)
 discard 9628354  Update REST API to support new ChatModule API (#681)
 discard b015aff  [Docs] ROCm installation docs (#689)
 discard 2440594  [Python] Perf evaluation interface, and publicity update (#688)
 discard ad46cb8  [Model Support] ChatGLM2 & CodeGeeX2 (#624)
 discard bdca829  Support CUDA multiarch build for compatibility (#686)
 discard 26a8b41  [iOS] - Send out the message when the user presses command + return keys on a keyboard. (#677)
 discard 18b0bdd  [BugFix] Embed Step (#680)
 discard fb6e73f  Python API overhaul (#645)
 discard 7977802  [BugFix] Add take indices type check for FuseDecodeTake (#628)
 discard f4587df  [Android][docs][iOS] Add model lib check and update docs (#675)
 discard bdeab17  [Docs] Update with RedPajama `q4f16_1` quantization (#673)
 discard 10ac428  [Bug fix] Remove potential trailing backslash (#671)
 discard fe27cf1  [iOS] Update config to use `q4f16_1` for RedPajama (#672)
 discard 502f680  Deprecate MetaSchedule database (#670)
 discard 66135d6  Enable cuda graph for the decoder (#653)
 discard 2ac70d4  Auto updated submodule references
 discard f8adb1d  [Fix] Copy tokenizer files before dumping config (#668)
 discard 07491d4  [Model] QKV matmul combination for GPT-NeoX (#667)
 discard ac39027  FasterTransformer offloading integration (#661)
 discard 6a970d6  Fix hardcoded SM version in CUTLASS offload (#664)
 discard b0d5edb  [FIX] Split -> Rotary fusion not getting applied (#662)
 discard 8f5e4a2  Enable cutlass by default (#660)
 discard 113bf7c  Add pass to fuse split after QKV projection and rotary embedding (#654)
 discard 39fed1f  [Quantization] QuantSpecUpdator visitor (#657)
 discard 41172de  Enable offloading attention and layer / RMS norm to CUTLASS (#651)
 discard ac8fa45  [Backend] Add ROCm support (#652)
 discard 3c53eeb  Update README.md (#656)
 discard 401427f  More guidelines on tracking (#649)
 discard c333d40  Add link related issues (#643)
 discard 9d3be50  [GITHUB] Add tracking issue template (#642)
 discard b9d7e18  [DOCS] Update docs to keep in sync with current state (#641)
 discard f0f5c74  [REST] Expose uvicorn host configuration to mlc chat command line args (#636)
 discard 487b028  Implement Llama2 in new concise nn.Module API (#631)
 discard bd5c644  Add support for safetensors (#627)
 discard 8913809  [bugfix] quick fix of convert_build_args_to_argparser (#630)
 discard 9c77d19  Suggest using `q4f16_1` than `q4f16_0` (#626)
 discard f70f746  [Bug fix] Fix build model api return value (#621)
 discard dccd1a7  Supports more general quantization func registration (#622)
 discard 698d6c2  Add nodejs access examples for REST APIs (#602)
 discard e563b43  [Fix] Rest API: Update system platform checks for Apple Silicon (#608) (#617)
 discard d2ccb92  [Docs] Small fix on supported architectures (#619)
 discard f985af3  [Perf] end-to-end benchmark, configurable prompt-len and gen-len (#612)
 discard 2137741  Lift weight conversion to an early stage (#607)
 discard 651706c  Update dlight rule application with GeMV rules (#599)
 discard 527721a  Auto updated submodule references
 discard d4c3a17  Add support for Pydantic v2 (#592)
 discard 0f95b35  [Fix][REST] Shared library suffix on Windows (#591)
 discard f8b1f8c  [PYTHON] Use abspath for dll module (#589)
 discard 8cdaf87  Add BuildArgs and Python API for Build (#582)
 discard f421ddf  [Fix] Correcting num-kv-heads in Llama kv-cache creation func (#576)
 discard da7fe57  Disable dispatching (#575)
 discard a4de4cb  [Doc] Clarification on the memory requirement of Llama2-70b-chat model (#574)
 discard 1ecff99  [Doc] Announce Llama 2 70B CLI support (#573)
 discard 428287a  Llama-2: Support Grouped Query Attention (#567)
 discard ab9cf2e  [Docs] Llama 2 13B CLI instructions (#572)
 discard ea9f2d8  iOS version bump and testflight link (#570)
 discard c2183b9  [Docs] Update prebuilt model lists (#565)
 discard 0a6fca3  [Docs] Update Llama 2 CLI gif in docs (#564)
 discard a960a4d  [Docs] CLI and compilation instructions for Llama-2 (#563)
 discard 75fb792  Stop-str for Llama-2, and `q4f32_1` for web (#562)
 discard 60249d7  [iOS] llama2 setup (#561)
 discard 5a4ba31  Introducing `q3f16_1` group quantization scheme (#560)
 discard c075710  Auto updated submodule references
 discard 69111ef  QKV and MLP matmul combination for LLaMA (#559)
 discard ce23bd6  [Model] LLaMA-2 support (#558)
 discard 75a44ed  [cleanup] remove utils.py (#555)
 discard 6cf8d4f  [iOS] support for multimodal (#524)
 discard 0358e5a  [Refactor] Make `mlc_llm` a package (#525)
 discard 2d4a17f  Add OpenCL+LLVM to compile targets (#552)
 discard 255dd53  [Doc] Add instructions on installation of AMD drivers (#553)
 discard 1d7f5e5  [iOS] Enable iOS Simulator (x86_64, arm64) (#549)
 discard 92093ef  Update URLs for Web-LLM and Web-Stable-Diffusion (#548)
 discard b6b971f  [HotFix] Fix RWKV modeling for import removal mistake
 discard e36a38d  [Refactor] Param loading integration into ParamManager (#542)
 discard 95e8139  [Param Manager] suppport auto-gptq checkpoints from param manager. (#506)
 discard 875a4ee  [BugFix] Add annotated decode type check for FuseDecodeTake (#541)
 discard 9c3f12c  [General] Correct processing of bfloat16 weights (#537)
 discard 6528f32  Auto updated submodule references
 discard c01e17e  [Testing] Add memory bandwidth to the evaluation script (#536)
 discard b1a086b  [Fix] FuseDecodeTranspose pass with PrimFunc deep copy (#535)
 discard 13a73c6  [CMAKE] Hide symbol by default (#534)
 discard 25df4b2  [BugFix] PlaceInPrompt behavior (#532)
 discard 4efb4e9  Outdated quantization flow cleanup (#529)
 discard abfa917  [Misc] Update of evaluation script (#530)
 discard 2b51661  Add completion endpoint to server and provide QA example (#520)
 discard 9f05a33  Adds model category from hf config file (#519)
 discard c164e2f  Supporting RWKV with ParamManager (#528)
 discard 8f638e0  Switch DefaultGPUSchedule pass to dlight fallback GPU schedule (#527)
 discard b71bd39  [Model] Add YuLan, and update documentation on supported model architectures (#518)
 discard a21c357  [Bugfix] Make GPTJ compatible with new codebase (#516)
 discard 0ad413d  Supporting GPTBigCode with ParamManager (#517)
 discard fb71da7  Add support for WizardLM (#489)
 discard 7b85168  Use ParamManager for MiniGPT (#515)
 discard 7b118c0  gradio polish and minigpt cli (#496)
 discard 8c9e000  [Docs] Enhances iOS app build scripts and docs (#511)
 discard d779718  Auto updated submodule references
 discard a00a05e  [Docs] Update WebGPU target compilation instructions (#510)
 discard d800c78  Auto updated submodule references
 discard 54c0e7c  Add support for Guanaco (#497)
 discard 76453c6  add santacoder as build option (#495)
 discard 0ac6078  Support GPT-NeoX with ParamManager and embedding separation (#490)
 discard 5a37e9a  Minor bug fix and reorganization on ChatModule embed separation (#488)
 discard d4c96e1  Clean up `op_pattern` attr in all places (#486)
 discard 8f45a74  Separate embed as a new function (#482)
 discard 3f29772  Initial `q4f16_1` quantization support with fusion (#484)
 discard ad8efb0  Add document on how to define new model architecture (#483)
 discard e8dec17  [Doc] Update docs for new Android build (#481)
 discard 72fd80a  Some fixes to support REST API in web-llm (#480)
 discard f732a44  Auto updated submodule references
 discard 1283cbf  ParamManager and new quantization framework (#477)
 discard 74b0c65  [Android] Decouple lib and app build (#478)
 discard 87d8b04  Clean up documentation (#475)
 discard 42e6423  Fix seg fault when trying to run REST server on macOS (#469)
 discard 7e10346  [Doc] Skip loading shared library when building docs (#474)
 discard c219355  Add support for q8f32 quantization (#472)
 discard a1458da  [Hotfix] Fix CPU wheels for Python API reference (#471)
 discard 3400549  [Doc] Link libmlc_llm.so for auto doucmenting Python APIs (#470)
 discard 183c221  Typo fix for error logs in ChatState.swift (#467)
 discard 713d0f6  add mlc-chat-nightly as requirements and fix duplicate labels (#465)
 discard f06f69d  [Hotfix] Add mlc-ai-nightly as requirements for documentation (#464)
 discard 99649af  Auto updated submodule references
 discard 14669ee  [Hotfix] Fix requirements.txt for documentation (#463)
 discard fc16bc3  [Doc] Documentation for Python API and gradio interface. (#462)
 discard 58b27c3  [Model Support] Add StarCoder & WizardCoder (#459)
 discard 2909069  Revert "[Multimodal Support] Add MiniGPT4 (#390)" (#461)
 discard 3500963  [Multimodal Support] Add MiniGPT4 (#390)
 discard 2647fbb  Auto updated submodule references
 discard b7b96ed  [DOCS] Add emcc docs (#456)
 discard 77de6a5  [DOCS] Add web build dep instructions (#455)
 discard 0561d80  Link to try-out documentation page in project page (#453)
 discard d6ea8b9  [Docs] Refine the model distribution page with commands (#449)
 discard 0ea92db  upd (#448)
 discard 89588aa  [Doc] Rename `mlc-chat-nightly` conda package to `mlc-chat-cli-nightly` (#444)
 discard 80cceb5  [Model] Update OpenLLaMA appearances (#445)
 discard 55b9562  Update README.md to link documentation and try-out page (#446)
 discard 6aea84f  [iOS][UX] UX Overhaul (#443)
 discard e00e07d  [DOCS] Add instructions to build from source for python (#442)
 discard a595a81  [DOCS] polish compile models (#441)
 discard 9f3faa1  [PYTHON] only return valid dll path (#440)
 discard 1412e74  [Docs] Document wording refine (#438)
 discard fb2cfee  Update javascript.rst (#437)
 discard e49353c  [Docs] Updated model distribution page with example (#436)
 discard 455fea2  [DOCS] minor cleanup (#435)
 discard 726ab71  Auto updated submodule references
 discard e2d301f  [DOCS] minor cleanup (#434)
 discard 931a0af  [Docs] Refine model compilation with CLI validation (#433)
 discard 33f90dd  [DOCS] update ios instruction (#432)
 discard 1ad453b  [Docs] Reorganize folder structure for compilation and prebuilt models (#431)
 discard e081f2e  [Docs] Refine "predbuilt models" page (#430)
 discard 5ec0801  [DOCS] improve the getting started (#429)
 discard 0dd9a12  [DOCS] high level reorganization (#428)
 discard 8e6ac81  [Docs] Restructure getting-started and proj overview (#427)
 discard b7c9a1a  [Docs] Compile and distribute models (#426)
 discard 4ba35fc  [Doc] Follow up of structure refactor (#425)
 discard b939ac6  [Doc] Refactor the structure (v1) (#424)
 discard c2a577c  Use 3rdparty/tvm only when TVM_HOME is not predefined. (#423)
 discard ecb29fe  [Build] Dump debug files only when debug-dump flag set (#422)
 discard 90bb031  Auto updated submodule references
 discard 53caceb  Setup github actions to update relax submodule automatically (#416)
 discard e366975  Allow user to customized ports in gradio and rest. (#418)
 discard 210d301  [iOS] Enable built from prebuilt (#417)
 discard 9c27e41  Update Package.swift
 discard 330162c  [iOS] Standalone Swift Package (#415)
 discard 1a66466  Add __init__.py for interface (#409)
 discard fb9c1e6  [Doc] Fix links in MLCChat config documentation (#410)
 discard 18c24d5  [Doc] Clarify TVM Build Instruction (#403)
 discard 75693b0  [Doc] Move MLCChat Config (#400)
 discard e012869  [Doc] Add Conda and GPU (#399)
 discard f663d97  [Doc] Add more clarity on what TVM Unity is (#395)
 discard 8aeb3df  Enhance error redability and non-CUDA friendliness (#394)
 discard ed59b01  [Doc] Runtime terminologies (the get-started tutorials for runtime module) (#393)
 discard 17081f9  [Refactor] Move text from README to documentation (#392)
 discard 2e643e6  [docs] small tweaks to the android getting started docs, capture step-by-step as you would start from a clean slate. Update docs on using 3rdparty relax tvm submodule (#377)
 discard ad176c2  More friendly error message when the input model is already compiled. (#391)
 discard 3cd905f  Detect macOS CPU Arch (#387)
 discard 93a124c  Langchain and OpenAI examples for REST API (#383)
 discard ee233a7  Fix incorrect import in REST API (#357)
 discard 5763a53  [Bugfix] Fix json config overriding conv template logic (#381)
 discard 77328e4  Tweak Docs (#378)
 discard 7e0bdeb  Update Doc (#373)
 discard 01fa991  [Android] Fix compile error (#310) and `ndk-build` cannot reference (#319)
 discard a9e58cc  [Android] Support model data clear and deletion (#372)
 discard fd2e357  [Hotfix] Replace ssh with https for relax submodule (#371)
 discard fabaca7  upd (#368)
 discard 0dcb596  Add missing mkdir to cpp/README.md (#358)
 discard 1d82af7  Update doc (#309)
 discard 4487d5b  [Doc] Update documentation (#351)
 discard abb3d07  [Doc] Tutorial on customizing conversation (#350)
 discard d2f10da  upd (#343)
 discard c900cc2  Customize`role_msg_sep` and `role_empty_sep` in Conversation Template. (#341)
 discard 8f1386f  Add support for Gorilla (#288)
 discard 395d41f  [iOS] Handle invalid input URL (#338)
 discard 85f80cc  [FIX] Fix the behavior when input is empty (#328)
 discard a985533  Remove legacy `tests/chat.py` (#322)
 discard 39b76af  [Android] Replace some functions in `AppViewModel` and `ChatModule` to make the code more in line with the usage habits of Android developers (#315)
 discard c13ac85  [FIX] Fix Crash when running Conversations without system prompts (#316)
 discard 0933e95  Update tokenizer-cpp to fix Windows build (#303)
 discard f6fa30c  Truncate Single Convocation  (#300)
 discard 417f0ac  Update tokenizers dep (#301)
 discard 46b1be3  Introduce System Prompts Step (#298)
 discard 251c7a7  Minor tweak link order (#297)
 discard 0eef89b  [Bugfix] Load models with bfloat16 dtype (#296)
 discard d1c1a67  Fix broken build after TVM updates (#295)
 discard a27374b  [Doc] Slight fix on documentation (#292)
 discard c856439  [Doc] Update document for RWKV (#293)
 discard d3c6053  [Android] Vicuna 7B q4f16 (#291)
 discard 12fc84a  Make libinfo discovery more conda friendly (#290)
 discard e358635  Model loading on shard level - GPT-NeoX, RWKV (#289)
 discard 52c2e12  [Doc] Navigation page refactor (#278)
 discard 98bb750  [ANDROID] New UI and multi model support (#287)
 discard 3242472  LLaMA family loading model on shard level - reducing memory usage (#282)
 discard b68b87c  [CMAKE] Fix to align with latest cmake (#281)
 discard e2d0931  Remove picojson submodule (#277)
 discard 2579787  [Doc] Fix the TVM Unity installation (#275)
 discard 98db588  Refactor REST API (#276)
 discard ce76c34  refactor gradio interface (#270)
 discard 762e156  [CMAKE] Fix the lib_llm_module compilation (#269)
 discard 7066eeb  [Doc] Update doc (#267)
 discard 404910f  [Github] Fix the issue templates format issue (#268)
 discard c87bcf1  [Github] Add issue templates. (#265)
 discard 096c8a5  Refactor mlc_chat into a formal package (#266)
 discard c409ca0  Enable vulkan for RWKV and some misc updates (#264)
 discard 60e2176  Minor terminology updates (#262)
 discard 642669d  Cleanup after conv refactor (#261)
 discard 9aed143  RWKV rebase (#142)
 discard fe6f0a6  Load conversation from JSON (#252)
 discard ddb14d4  [Doc] Reorganize documentation and update contents. (#256)
 discard 2b0bb21  [Minor] fix typos (#254)
 discard 5d213c8  fix minor typo that causes lib not found (#253)
 discard a86b4fa  Update note
 discard de0d294  [REFACTOR] Refactor conv template (#251)
 discard e699b5a  [Build] Copy `tokenizer_config.json` (#250)
 discard d9bac01  [Doc] Update docs for uploading model and iOS deployment (#249)
 discard 2409f92  [Python] Support for gradio api (#247)
 discard 128410f  Update Documentations (#248)
 discard 127013c  Use tlcpack theme
 discard c971d45  Gihhub actions for docs (#245)
 discard 86c593e  docs: Update git clone with `--recursive` option (#244)
 discard 68956a1  Update README.md
 discard 061b5fe  Update README.md
 discard 7135066  [Doc] The initial version of documentation (#242)
 discard ae00b0f  More clarify on model loading (#237)
 discard 9a7051e  Fix prep_deps.sh (#235)
 discard 2256cae  Implement Python chat module and REST API (#223)
 discard 3f74b1c  Update README.md (#228)
 discard 14330cf  open-llama-7b (#224)
 discard 892d5bd  improve android/README.md and ndk-build path in gradle to help build (#231)
 discard 5aef0dd  Isolate model fetching in a separate process (#227)
 discard eed5a28  Add auto detect support for vulkan (#225)
 discard f1dcc7f  Remove std::filesystem::canonicalize (#222)
 discard 7db313c  Python script to generate cmake config file for `mlc_chat_cli` (#221)
 discard 1f53191  Update `prep_deps` to include cargo installation script (#220)
 discard 110c6d3  [FIX] fix transform params without GPUs (#219)
 discard 68be032  Accelerating quantization computation by weight compression (#45)
 discard 8f78235  Replace std::string with std::filesystem::path (#213)
 discard 1bd6786  [Hotfix] C++ compilation issue (#210)
 discard 2b078ac  prefill (#209)
 discard 968d7e0  Fix Path finding for library with arch suffix (#207)
 discard 952367d  Update tir dispatch for vicuna fp32 (#208)
 discard 31bcf0e  Database in-memory merge (#206)
 discard b6610e4  Update README.md and Project Page (#205)
 discard 31939d9  Fix database regarding the newly introduced purity flag (#204)
 discard 0422ab9  decode (#201)
 discard b76d5d3  API Consistency (#199)
 discard ecea7b5  Search mlc-chat prefix for prebuilt (#198)
 discard 410be36  Update instruction (#197)
 discard fac3201  Use prefill for term consistency (#196)
 discard 305865c  Fix excessive thread usages in layer norm (#193)
 discard 4ce3022  Update lib search to right loc (#192)
 discard f3e1b39  Fix lifting TIR global buffer allocation pass (#191)
 discard 6de9506  Fix Windows build (#190)
 discard f65df32  Add format macro needed for picojson (#189)
 discard e8197e8  Use `local_id` in `mlc-chat-config.json` to find lib (#188)
 discard da63f2c  Add model weight variant in iOS (#187)
 discard 4ebfba9  [Transform] Lift global buffer allocation in TIR PrimFunc (#184)
 discard 5351693  iOS downloader integration (#183)
 discard 058cbbf  Free memory on reload using memory allocator clear func (#181)
 discard 944fc69  Allow users to specify kernel libarry reuse (#180)
 discard a181bd5  GPT-NeoX allocating full-length KV cache (#179)
 discard de7b5ab  Update iOS to use latest APIs in llm chat (#178)
 discard ff81bdb  [Decoding] Support repetition penalty (#177)
 discard 615020d  [Lint] Make `build.py` more compliant to pylint/mypy (#176)
 discard 5184eb8  Support RedPajama-INCITE-Chat-3B-v1 (#175)
 discard 6808bcf  Add tokenizer list in default chat config (#173)
 discard 471052b  Support model reload in chat module and CLI (#171)
 discard 47a7a11  Initialize chat module from config JSON string (#166)
 discard 0078c52  Introducing model metadata function (#165)
 discard 937b1fc  Remove max-gen-len from chat module due to duplication (#164)
 discard f3afed4  Unify model specification in build / Hardcode cached rotary emb size / Update config name (#163)
 discard 5faac09  Allow specifying model name in build (#161)
 discard 788242c  Revamp tokenizer to use tokenizers-cpp (#160)
 discard f523105  [hotfix] Fix discord link (#158)
 discard 26d341a  [Community] Add link to Discord server (#157)
 discard 60a409b  Further reorganize artifact structure (#156)
 discard 801f573  fix for --device_id argument casting (#154)
 discard 1a4ee89  Add support for specifying custom model path (#140)
 discard 09b8b6e  Export default mlc_llm_config in build (#146)
 discard 9e29ff0  add system lib for webgpu (#145)
 discard 9a3e3c7  Update build-from-source instructions for iOS and Android (#132)
 discard 6a5dec8  Introduce preset compilation modes and organize artifact name (#130)
 discard 0180b24  downloader (#131)
 discard 9129a4f  Update Android insturcions for build from source (#124)
 discard 7a7e4b9  Add missing create_softmax_func to gpt_neox.py (#121)
 discard 328ebd9  Fix Android app stats collection (#114)
 discard 060d1e4  Support ByteLevelBPE Tokenizer (#92)
 discard 0c1f054  Project page linking to blog (#108)
 discard cfaa5ee  update pointer (#107)
 discard 460884f  MLC-LLM support for Android (#106)
 discard 79723ab  Fix typo in utils.py (#102)
 discard 909f267  Auto-tuning scripts to maximize GPU kernel performance (#62)
 discard b7be649  Reduce memory usage in model building process (#96)
 discard fcacd2e  [Hotfix] Fix the bug in `modules.py` (#93)
 discard dc106cd  docs: typo fix  (#89)
 discard d1ea7cf  Update IOS installation instructions (#87)
 discard 9b76002  Update README to mention Intel GPU support (#72)
 discard 9d9a219  Corrected a typo (#64)
 discard e656855  docs: Update README.md / typos fixed (#63)
 discard 0b0428c  Fix the bug of MOSS compilation with fp16 (#44)
 discard b747641  Fix: dtype mismatch during quantization (#43)
 discard 3fc2197  Print size of model weight when compiling (#42)
 discard dbe2fef  [Fix] Backward search for effective utf8 (#35)
 discard 7f349a4  . (#32)
 discard 1a003c8  [New Model] Support moss-moon-003-sft (#3)
 discard 5bdcc86  [Fix] Fix the potential OOB of UTF8 utils (#30)
 discard a72cb96  Fix typo in llm_chat.cc (#29)
 discard 65e9770  Update README.md
 discard 2ef559e  Add Link to Web-Stable-Diffusion (#24)
 discard 486d5b0  Moving most device memory static (#21)
 discard d3e7f16  [CLI] Add stats command (#14)
 discard b8c421b  Update README.md
 discard b636b49  Update README.md (#10)
 discard 6e3d46d  Update README.md (#9)
 discard ae08d4c  Support for cross compiling dylib to x86 macos (#8)
 discard 630b061  Update README.md
 discard bf41ec3  Update link reference to MLC course
 discard cea09f2  Initial commit