You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by ho...@apache.org on 2023/09/05 21:12:04 UTC
[tvm-vta] branch disco-integration deleted (was 0a75b2c)
This is an automated email from the ASF dual-hosted git repository.
hongyij pushed a change to branch disco-integration
in repository https://gitbox.apache.org/repos/asf/tvm-vta.git
was 0a75b2c get vocab size from config
This change permanently discards the following revisions:
discard 0a75b2c get vocab size from config
discard 21251ea disco+mlc on single machine
discard 2bd2539 Change printing in chat module to logging info (#818)
discard dc36bdd [Doc] Fix typo,--model not --local-id now (#822)
discard 0858630 Update try_out.rst (changed --local-id param to --model and moved notes about drivers. (#817)
discard 8649d4f [Minor] Remove PyTest Dependencies by default (#814)
discard f36f592 [Model Support] CodeLlama (#809)
discard 5fe6344 [Quantization] AutoGPTQ refactor and matmul combination support (#694)
discard 94bda91 Disable decoding for system prompts (#807)
discard c32dbda Update CLI to be more consistent with ChatModule (#789)
discard 7c135b8 Optionaly use max_sequence_length in config for split rotary fusion (#801)
discard 3a10dbb [CMAKE] Add check for rust installed (#799)
discard 66550e0 [FIX] Fix error `max_seq_len == -1` (#797)
discard e704c8b [Doc] Minor update to `Build Android Package from Source` section (#785)
discard d4ca67e added cors to fast api (#757)
discard 4127782 Update Llama2 cached sin/cos to use max_sequence_length (#780)
discard d8eadc1 Update gpu.rst to add sudo apt update before first install (#784)
discard 796d3fd [Doc] Update doc for prebuilt models (#767)
discard fbce5a3 Improve code completion experience (#772)
discard ab40434 Automatically set 'offset' parameter if 'messages' parameter is set (#754)
discard 0735f6c Update tokenizers-cpp to latest and fix rust build error (#762)
discard abac1a3 [Utils] Skip generating benchmark scripts in cases (#759)
discard e9579e4 [Android] Add libOpenCL-pixel for supporintg Pixel phones. (#723)
discard 976d9f3 [Doc] Add the Mali-OpenCL setup instruction (#749)
discard a39e0d7 Refactor MLC Chat iOS App (#746)
discard 3d46654 [Docs] Add doc for build api and BuildArgs (#610)
discard 0637ef6 Added missing install command (#745)
discard d59185a Add conversation template support for Wizard models (#741)
discard e39d773 Extend `gen_cmake_config.py` for TVM_HOME, CUTLASS, CUBLAS (#743)
discard 94e0109 Add PrimFunc Benchmark Script Dump in Debug Dump Mode (#738)
discard 3c7bac5 [Mali] Add CLI support for Mali device (#734)
discard 282f87b Add `q8f16_1` for benchmark (#736)
discard 5ac98e6 Fix OPENCL variable in tvm.rst (#735)
discard 47e9297 [Mali] Add Mali Device Tag (#733)
discard 88e6d3e [CLI] Device parsing in `"device:id"` style (#728)
discard bfba99b Auto updated submodule references
discard 0fb027f [RWKV] Improve RWKV (#615)
discard 5d6d47f [BugFix] Fix extra ChatConfig args crash (#722)
discard 529bfcb [Bug fix] Small fix on RestAPIArgs (#724)
discard 5b3fa31 Add cublas offload (#695)
discard 5f5ca60 Update wording (#717)
discard ae76577 [Fix] Disable CUDA multiarch fatbin by default (#716)
discard f177154 [Gradio] Update in accordance with new ChatModule (#711)
discard c9ea80d [Docs] Add StreamToStdout to Python API examples (#707)
discard a4efbf9 Update REST API docs to reflect new args (#708)
discard e744f23 [Examples] Use stdout stream for Python API examples (#705)
discard 64fdc15 [Docs][Python] Update `benchmark_generate` docs (#704)
discard 514d8ab [CMAKE] Update gen_cmake_config.py to use ROCm (#703)
discard 8c4fec0 [Docs] Link the Python API notebook to Python API try-out (#702)
discard 054ddfd [Fix][Python] Reset chat before benchmark warmup (#701)
discard ddc844c [Docs] Try-out instructions for Python API (#700)
discard 79582ef [Docs] Python API finer documentation (#687)
discard 2cb5956 Update ChatModule usage in REST API (#698)
discard e068f88 [Python API] high-level function update (#696)
discard 750d4ea [ChatModule] Raw text generation and benchmark (#697)
discard 9628354 Update REST API to support new ChatModule API (#681)
discard b015aff [Docs] ROCm installation docs (#689)
discard 2440594 [Python] Perf evaluation interface, and publicity update (#688)
discard ad46cb8 [Model Support] ChatGLM2 & CodeGeeX2 (#624)
discard bdca829 Support CUDA multiarch build for compatibility (#686)
discard 26a8b41 [iOS] - Send out the message when the user presses command + return keys on a keyboard. (#677)
discard 18b0bdd [BugFix] Embed Step (#680)
discard fb6e73f Python API overhaul (#645)
discard 7977802 [BugFix] Add take indices type check for FuseDecodeTake (#628)
discard f4587df [Android][docs][iOS] Add model lib check and update docs (#675)
discard bdeab17 [Docs] Update with RedPajama `q4f16_1` quantization (#673)
discard 10ac428 [Bug fix] Remove potential trailing backslash (#671)
discard fe27cf1 [iOS] Update config to use `q4f16_1` for RedPajama (#672)
discard 502f680 Deprecate MetaSchedule database (#670)
discard 66135d6 Enable cuda graph for the decoder (#653)
discard 2ac70d4 Auto updated submodule references
discard f8adb1d [Fix] Copy tokenizer files before dumping config (#668)
discard 07491d4 [Model] QKV matmul combination for GPT-NeoX (#667)
discard ac39027 FasterTransformer offloading integration (#661)
discard 6a970d6 Fix hardcoded SM version in CUTLASS offload (#664)
discard b0d5edb [FIX] Split -> Rotary fusion not getting applied (#662)
discard 8f5e4a2 Enable cutlass by default (#660)
discard 113bf7c Add pass to fuse split after QKV projection and rotary embedding (#654)
discard 39fed1f [Quantization] QuantSpecUpdator visitor (#657)
discard 41172de Enable offloading attention and layer / RMS norm to CUTLASS (#651)
discard ac8fa45 [Backend] Add ROCm support (#652)
discard 3c53eeb Update README.md (#656)
discard 401427f More guidelines on tracking (#649)
discard c333d40 Add link related issues (#643)
discard 9d3be50 [GITHUB] Add tracking issue template (#642)
discard b9d7e18 [DOCS] Update docs to keep in sync with current state (#641)
discard f0f5c74 [REST] Expose uvicorn host configuration to mlc chat command line args (#636)
discard 487b028 Implement Llama2 in new concise nn.Module API (#631)
discard bd5c644 Add support for safetensors (#627)
discard 8913809 [bugfix] quick fix of convert_build_args_to_argparser (#630)
discard 9c77d19 Suggest using `q4f16_1` than `q4f16_0` (#626)
discard f70f746 [Bug fix] Fix build model api return value (#621)
discard dccd1a7 Supports more general quantization func registration (#622)
discard 698d6c2 Add nodejs access examples for REST APIs (#602)
discard e563b43 [Fix] Rest API: Update system platform checks for Apple Silicon (#608) (#617)
discard d2ccb92 [Docs] Small fix on supported architectures (#619)
discard f985af3 [Perf] end-to-end benchmark, configurable prompt-len and gen-len (#612)
discard 2137741 Lift weight conversion to an early stage (#607)
discard 651706c Update dlight rule application with GeMV rules (#599)
discard 527721a Auto updated submodule references
discard d4c3a17 Add support for Pydantic v2 (#592)
discard 0f95b35 [Fix][REST] Shared library suffix on Windows (#591)
discard f8b1f8c [PYTHON] Use abspath for dll module (#589)
discard 8cdaf87 Add BuildArgs and Python API for Build (#582)
discard f421ddf [Fix] Correcting num-kv-heads in Llama kv-cache creation func (#576)
discard da7fe57 Disable dispatching (#575)
discard a4de4cb [Doc] Clarification on the memory requirement of Llama2-70b-chat model (#574)
discard 1ecff99 [Doc] Announce Llama 2 70B CLI support (#573)
discard 428287a Llama-2: Support Grouped Query Attention (#567)
discard ab9cf2e [Docs] Llama 2 13B CLI instructions (#572)
discard ea9f2d8 iOS version bump and testflight link (#570)
discard c2183b9 [Docs] Update prebuilt model lists (#565)
discard 0a6fca3 [Docs] Update Llama 2 CLI gif in docs (#564)
discard a960a4d [Docs] CLI and compilation instructions for Llama-2 (#563)
discard 75fb792 Stop-str for Llama-2, and `q4f32_1` for web (#562)
discard 60249d7 [iOS] llama2 setup (#561)
discard 5a4ba31 Introducing `q3f16_1` group quantization scheme (#560)
discard c075710 Auto updated submodule references
discard 69111ef QKV and MLP matmul combination for LLaMA (#559)
discard ce23bd6 [Model] LLaMA-2 support (#558)
discard 75a44ed [cleanup] remove utils.py (#555)
discard 6cf8d4f [iOS] support for multimodal (#524)
discard 0358e5a [Refactor] Make `mlc_llm` a package (#525)
discard 2d4a17f Add OpenCL+LLVM to compile targets (#552)
discard 255dd53 [Doc] Add instructions on installation of AMD drivers (#553)
discard 1d7f5e5 [iOS] Enable iOS Simulator (x86_64, arm64) (#549)
discard 92093ef Update URLs for Web-LLM and Web-Stable-Diffusion (#548)
discard b6b971f [HotFix] Fix RWKV modeling for import removal mistake
discard e36a38d [Refactor] Param loading integration into ParamManager (#542)
discard 95e8139 [Param Manager] suppport auto-gptq checkpoints from param manager. (#506)
discard 875a4ee [BugFix] Add annotated decode type check for FuseDecodeTake (#541)
discard 9c3f12c [General] Correct processing of bfloat16 weights (#537)
discard 6528f32 Auto updated submodule references
discard c01e17e [Testing] Add memory bandwidth to the evaluation script (#536)
discard b1a086b [Fix] FuseDecodeTranspose pass with PrimFunc deep copy (#535)
discard 13a73c6 [CMAKE] Hide symbol by default (#534)
discard 25df4b2 [BugFix] PlaceInPrompt behavior (#532)
discard 4efb4e9 Outdated quantization flow cleanup (#529)
discard abfa917 [Misc] Update of evaluation script (#530)
discard 2b51661 Add completion endpoint to server and provide QA example (#520)
discard 9f05a33 Adds model category from hf config file (#519)
discard c164e2f Supporting RWKV with ParamManager (#528)
discard 8f638e0 Switch DefaultGPUSchedule pass to dlight fallback GPU schedule (#527)
discard b71bd39 [Model] Add YuLan, and update documentation on supported model architectures (#518)
discard a21c357 [Bugfix] Make GPTJ compatible with new codebase (#516)
discard 0ad413d Supporting GPTBigCode with ParamManager (#517)
discard fb71da7 Add support for WizardLM (#489)
discard 7b85168 Use ParamManager for MiniGPT (#515)
discard 7b118c0 gradio polish and minigpt cli (#496)
discard 8c9e000 [Docs] Enhances iOS app build scripts and docs (#511)
discard d779718 Auto updated submodule references
discard a00a05e [Docs] Update WebGPU target compilation instructions (#510)
discard d800c78 Auto updated submodule references
discard 54c0e7c Add support for Guanaco (#497)
discard 76453c6 add santacoder as build option (#495)
discard 0ac6078 Support GPT-NeoX with ParamManager and embedding separation (#490)
discard 5a37e9a Minor bug fix and reorganization on ChatModule embed separation (#488)
discard d4c96e1 Clean up `op_pattern` attr in all places (#486)
discard 8f45a74 Separate embed as a new function (#482)
discard 3f29772 Initial `q4f16_1` quantization support with fusion (#484)
discard ad8efb0 Add document on how to define new model architecture (#483)
discard e8dec17 [Doc] Update docs for new Android build (#481)
discard 72fd80a Some fixes to support REST API in web-llm (#480)
discard f732a44 Auto updated submodule references
discard 1283cbf ParamManager and new quantization framework (#477)
discard 74b0c65 [Android] Decouple lib and app build (#478)
discard 87d8b04 Clean up documentation (#475)
discard 42e6423 Fix seg fault when trying to run REST server on macOS (#469)
discard 7e10346 [Doc] Skip loading shared library when building docs (#474)
discard c219355 Add support for q8f32 quantization (#472)
discard a1458da [Hotfix] Fix CPU wheels for Python API reference (#471)
discard 3400549 [Doc] Link libmlc_llm.so for auto doucmenting Python APIs (#470)
discard 183c221 Typo fix for error logs in ChatState.swift (#467)
discard 713d0f6 add mlc-chat-nightly as requirements and fix duplicate labels (#465)
discard f06f69d [Hotfix] Add mlc-ai-nightly as requirements for documentation (#464)
discard 99649af Auto updated submodule references
discard 14669ee [Hotfix] Fix requirements.txt for documentation (#463)
discard fc16bc3 [Doc] Documentation for Python API and gradio interface. (#462)
discard 58b27c3 [Model Support] Add StarCoder & WizardCoder (#459)
discard 2909069 Revert "[Multimodal Support] Add MiniGPT4 (#390)" (#461)
discard 3500963 [Multimodal Support] Add MiniGPT4 (#390)
discard 2647fbb Auto updated submodule references
discard b7b96ed [DOCS] Add emcc docs (#456)
discard 77de6a5 [DOCS] Add web build dep instructions (#455)
discard 0561d80 Link to try-out documentation page in project page (#453)
discard d6ea8b9 [Docs] Refine the model distribution page with commands (#449)
discard 0ea92db upd (#448)
discard 89588aa [Doc] Rename `mlc-chat-nightly` conda package to `mlc-chat-cli-nightly` (#444)
discard 80cceb5 [Model] Update OpenLLaMA appearances (#445)
discard 55b9562 Update README.md to link documentation and try-out page (#446)
discard 6aea84f [iOS][UX] UX Overhaul (#443)
discard e00e07d [DOCS] Add instructions to build from source for python (#442)
discard a595a81 [DOCS] polish compile models (#441)
discard 9f3faa1 [PYTHON] only return valid dll path (#440)
discard 1412e74 [Docs] Document wording refine (#438)
discard fb2cfee Update javascript.rst (#437)
discard e49353c [Docs] Updated model distribution page with example (#436)
discard 455fea2 [DOCS] minor cleanup (#435)
discard 726ab71 Auto updated submodule references
discard e2d301f [DOCS] minor cleanup (#434)
discard 931a0af [Docs] Refine model compilation with CLI validation (#433)
discard 33f90dd [DOCS] update ios instruction (#432)
discard 1ad453b [Docs] Reorganize folder structure for compilation and prebuilt models (#431)
discard e081f2e [Docs] Refine "predbuilt models" page (#430)
discard 5ec0801 [DOCS] improve the getting started (#429)
discard 0dd9a12 [DOCS] high level reorganization (#428)
discard 8e6ac81 [Docs] Restructure getting-started and proj overview (#427)
discard b7c9a1a [Docs] Compile and distribute models (#426)
discard 4ba35fc [Doc] Follow up of structure refactor (#425)
discard b939ac6 [Doc] Refactor the structure (v1) (#424)
discard c2a577c Use 3rdparty/tvm only when TVM_HOME is not predefined. (#423)
discard ecb29fe [Build] Dump debug files only when debug-dump flag set (#422)
discard 90bb031 Auto updated submodule references
discard 53caceb Setup github actions to update relax submodule automatically (#416)
discard e366975 Allow user to customized ports in gradio and rest. (#418)
discard 210d301 [iOS] Enable built from prebuilt (#417)
discard 9c27e41 Update Package.swift
discard 330162c [iOS] Standalone Swift Package (#415)
discard 1a66466 Add __init__.py for interface (#409)
discard fb9c1e6 [Doc] Fix links in MLCChat config documentation (#410)
discard 18c24d5 [Doc] Clarify TVM Build Instruction (#403)
discard 75693b0 [Doc] Move MLCChat Config (#400)
discard e012869 [Doc] Add Conda and GPU (#399)
discard f663d97 [Doc] Add more clarity on what TVM Unity is (#395)
discard 8aeb3df Enhance error redability and non-CUDA friendliness (#394)
discard ed59b01 [Doc] Runtime terminologies (the get-started tutorials for runtime module) (#393)
discard 17081f9 [Refactor] Move text from README to documentation (#392)
discard 2e643e6 [docs] small tweaks to the android getting started docs, capture step-by-step as you would start from a clean slate. Update docs on using 3rdparty relax tvm submodule (#377)
discard ad176c2 More friendly error message when the input model is already compiled. (#391)
discard 3cd905f Detect macOS CPU Arch (#387)
discard 93a124c Langchain and OpenAI examples for REST API (#383)
discard ee233a7 Fix incorrect import in REST API (#357)
discard 5763a53 [Bugfix] Fix json config overriding conv template logic (#381)
discard 77328e4 Tweak Docs (#378)
discard 7e0bdeb Update Doc (#373)
discard 01fa991 [Android] Fix compile error (#310) and `ndk-build` cannot reference (#319)
discard a9e58cc [Android] Support model data clear and deletion (#372)
discard fd2e357 [Hotfix] Replace ssh with https for relax submodule (#371)
discard fabaca7 upd (#368)
discard 0dcb596 Add missing mkdir to cpp/README.md (#358)
discard 1d82af7 Update doc (#309)
discard 4487d5b [Doc] Update documentation (#351)
discard abb3d07 [Doc] Tutorial on customizing conversation (#350)
discard d2f10da upd (#343)
discard c900cc2 Customize`role_msg_sep` and `role_empty_sep` in Conversation Template. (#341)
discard 8f1386f Add support for Gorilla (#288)
discard 395d41f [iOS] Handle invalid input URL (#338)
discard 85f80cc [FIX] Fix the behavior when input is empty (#328)
discard a985533 Remove legacy `tests/chat.py` (#322)
discard 39b76af [Android] Replace some functions in `AppViewModel` and `ChatModule` to make the code more in line with the usage habits of Android developers (#315)
discard c13ac85 [FIX] Fix Crash when running Conversations without system prompts (#316)
discard 0933e95 Update tokenizer-cpp to fix Windows build (#303)
discard f6fa30c Truncate Single Convocation (#300)
discard 417f0ac Update tokenizers dep (#301)
discard 46b1be3 Introduce System Prompts Step (#298)
discard 251c7a7 Minor tweak link order (#297)
discard 0eef89b [Bugfix] Load models with bfloat16 dtype (#296)
discard d1c1a67 Fix broken build after TVM updates (#295)
discard a27374b [Doc] Slight fix on documentation (#292)
discard c856439 [Doc] Update document for RWKV (#293)
discard d3c6053 [Android] Vicuna 7B q4f16 (#291)
discard 12fc84a Make libinfo discovery more conda friendly (#290)
discard e358635 Model loading on shard level - GPT-NeoX, RWKV (#289)
discard 52c2e12 [Doc] Navigation page refactor (#278)
discard 98bb750 [ANDROID] New UI and multi model support (#287)
discard 3242472 LLaMA family loading model on shard level - reducing memory usage (#282)
discard b68b87c [CMAKE] Fix to align with latest cmake (#281)
discard e2d0931 Remove picojson submodule (#277)
discard 2579787 [Doc] Fix the TVM Unity installation (#275)
discard 98db588 Refactor REST API (#276)
discard ce76c34 refactor gradio interface (#270)
discard 762e156 [CMAKE] Fix the lib_llm_module compilation (#269)
discard 7066eeb [Doc] Update doc (#267)
discard 404910f [Github] Fix the issue templates format issue (#268)
discard c87bcf1 [Github] Add issue templates. (#265)
discard 096c8a5 Refactor mlc_chat into a formal package (#266)
discard c409ca0 Enable vulkan for RWKV and some misc updates (#264)
discard 60e2176 Minor terminology updates (#262)
discard 642669d Cleanup after conv refactor (#261)
discard 9aed143 RWKV rebase (#142)
discard fe6f0a6 Load conversation from JSON (#252)
discard ddb14d4 [Doc] Reorganize documentation and update contents. (#256)
discard 2b0bb21 [Minor] fix typos (#254)
discard 5d213c8 fix minor typo that causes lib not found (#253)
discard a86b4fa Update note
discard de0d294 [REFACTOR] Refactor conv template (#251)
discard e699b5a [Build] Copy `tokenizer_config.json` (#250)
discard d9bac01 [Doc] Update docs for uploading model and iOS deployment (#249)
discard 2409f92 [Python] Support for gradio api (#247)
discard 128410f Update Documentations (#248)
discard 127013c Use tlcpack theme
discard c971d45 Gihhub actions for docs (#245)
discard 86c593e docs: Update git clone with `--recursive` option (#244)
discard 68956a1 Update README.md
discard 061b5fe Update README.md
discard 7135066 [Doc] The initial version of documentation (#242)
discard ae00b0f More clarify on model loading (#237)
discard 9a7051e Fix prep_deps.sh (#235)
discard 2256cae Implement Python chat module and REST API (#223)
discard 3f74b1c Update README.md (#228)
discard 14330cf open-llama-7b (#224)
discard 892d5bd improve android/README.md and ndk-build path in gradle to help build (#231)
discard 5aef0dd Isolate model fetching in a separate process (#227)
discard eed5a28 Add auto detect support for vulkan (#225)
discard f1dcc7f Remove std::filesystem::canonicalize (#222)
discard 7db313c Python script to generate cmake config file for `mlc_chat_cli` (#221)
discard 1f53191 Update `prep_deps` to include cargo installation script (#220)
discard 110c6d3 [FIX] fix transform params without GPUs (#219)
discard 68be032 Accelerating quantization computation by weight compression (#45)
discard 8f78235 Replace std::string with std::filesystem::path (#213)
discard 1bd6786 [Hotfix] C++ compilation issue (#210)
discard 2b078ac prefill (#209)
discard 968d7e0 Fix Path finding for library with arch suffix (#207)
discard 952367d Update tir dispatch for vicuna fp32 (#208)
discard 31bcf0e Database in-memory merge (#206)
discard b6610e4 Update README.md and Project Page (#205)
discard 31939d9 Fix database regarding the newly introduced purity flag (#204)
discard 0422ab9 decode (#201)
discard b76d5d3 API Consistency (#199)
discard ecea7b5 Search mlc-chat prefix for prebuilt (#198)
discard 410be36 Update instruction (#197)
discard fac3201 Use prefill for term consistency (#196)
discard 305865c Fix excessive thread usages in layer norm (#193)
discard 4ce3022 Update lib search to right loc (#192)
discard f3e1b39 Fix lifting TIR global buffer allocation pass (#191)
discard 6de9506 Fix Windows build (#190)
discard f65df32 Add format macro needed for picojson (#189)
discard e8197e8 Use `local_id` in `mlc-chat-config.json` to find lib (#188)
discard da63f2c Add model weight variant in iOS (#187)
discard 4ebfba9 [Transform] Lift global buffer allocation in TIR PrimFunc (#184)
discard 5351693 iOS downloader integration (#183)
discard 058cbbf Free memory on reload using memory allocator clear func (#181)
discard 944fc69 Allow users to specify kernel libarry reuse (#180)
discard a181bd5 GPT-NeoX allocating full-length KV cache (#179)
discard de7b5ab Update iOS to use latest APIs in llm chat (#178)
discard ff81bdb [Decoding] Support repetition penalty (#177)
discard 615020d [Lint] Make `build.py` more compliant to pylint/mypy (#176)
discard 5184eb8 Support RedPajama-INCITE-Chat-3B-v1 (#175)
discard 6808bcf Add tokenizer list in default chat config (#173)
discard 471052b Support model reload in chat module and CLI (#171)
discard 47a7a11 Initialize chat module from config JSON string (#166)
discard 0078c52 Introducing model metadata function (#165)
discard 937b1fc Remove max-gen-len from chat module due to duplication (#164)
discard f3afed4 Unify model specification in build / Hardcode cached rotary emb size / Update config name (#163)
discard 5faac09 Allow specifying model name in build (#161)
discard 788242c Revamp tokenizer to use tokenizers-cpp (#160)
discard f523105 [hotfix] Fix discord link (#158)
discard 26d341a [Community] Add link to Discord server (#157)
discard 60a409b Further reorganize artifact structure (#156)
discard 801f573 fix for --device_id argument casting (#154)
discard 1a4ee89 Add support for specifying custom model path (#140)
discard 09b8b6e Export default mlc_llm_config in build (#146)
discard 9e29ff0 add system lib for webgpu (#145)
discard 9a3e3c7 Update build-from-source instructions for iOS and Android (#132)
discard 6a5dec8 Introduce preset compilation modes and organize artifact name (#130)
discard 0180b24 downloader (#131)
discard 9129a4f Update Android insturcions for build from source (#124)
discard 7a7e4b9 Add missing create_softmax_func to gpt_neox.py (#121)
discard 328ebd9 Fix Android app stats collection (#114)
discard 060d1e4 Support ByteLevelBPE Tokenizer (#92)
discard 0c1f054 Project page linking to blog (#108)
discard cfaa5ee update pointer (#107)
discard 460884f MLC-LLM support for Android (#106)
discard 79723ab Fix typo in utils.py (#102)
discard 909f267 Auto-tuning scripts to maximize GPU kernel performance (#62)
discard b7be649 Reduce memory usage in model building process (#96)
discard fcacd2e [Hotfix] Fix the bug in `modules.py` (#93)
discard dc106cd docs: typo fix (#89)
discard d1ea7cf Update IOS installation instructions (#87)
discard 9b76002 Update README to mention Intel GPU support (#72)
discard 9d9a219 Corrected a typo (#64)
discard e656855 docs: Update README.md / typos fixed (#63)
discard 0b0428c Fix the bug of MOSS compilation with fp16 (#44)
discard b747641 Fix: dtype mismatch during quantization (#43)
discard 3fc2197 Print size of model weight when compiling (#42)
discard dbe2fef [Fix] Backward search for effective utf8 (#35)
discard 7f349a4 . (#32)
discard 1a003c8 [New Model] Support moss-moon-003-sft (#3)
discard 5bdcc86 [Fix] Fix the potential OOB of UTF8 utils (#30)
discard a72cb96 Fix typo in llm_chat.cc (#29)
discard 65e9770 Update README.md
discard 2ef559e Add Link to Web-Stable-Diffusion (#24)
discard 486d5b0 Moving most device memory static (#21)
discard d3e7f16 [CLI] Add stats command (#14)
discard b8c421b Update README.md
discard b636b49 Update README.md (#10)
discard 6e3d46d Update README.md (#9)
discard ae08d4c Support for cross compiling dylib to x86 macos (#8)
discard 630b061 Update README.md
discard bf41ec3 Update link reference to MLC course
discard cea09f2 Initial commit