You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by "masahi (via GitHub)" <gi...@apache.org> on 2023/10/26 20:22:21 UTC

[PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

masahi opened a new pull request, #15995:
URL: https://github.com/apache/tvm/pull/15995

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "cbalint13 (via GitHub)" <gi...@apache.org>.

cbalint13 commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1375052987


##########
cmake/modules/CUDA.cmake:
##########
@@ -64,6 +64,7 @@ if(USE_CUDA)
     message(STATUS "Build with Thrust support")
     cmake_minimum_required(VERSION 3.13) # to compile CUDA code
     enable_language(CUDA)
+    set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   @masahi 
   
   CMAKE_CUDA_ARCHITECTURES is designed to be eprovided by user at cmake config time.
   * There are also values like 'auto', 'all' see [CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html)
   * Better throw a warning like: 'Using 'all' for CUDAARCHS, use CMAKE_CUDA_ARCHITECTURES to override'.
   * Also it is initialized by env var [CUDAARCHS](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) and picked up from there.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1377096574


##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   Ok made it configurable, and use `80;75` by default. `auto` will make the compilation time for these kernels (especially thrust) extremely slow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "cbalint13 (via GitHub)" <gi...@apache.org>.

cbalint13 commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1375053386


##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   @masahi
   
   The [CMAKE_CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) is designed to be eprovided by user at cmake config time.
   * There are also values like 'auto', 'all' see [CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html)
   * Better throw a warning like: 'Using 'all' for CUDAARCHS, use CMAKE_CUDA_ARCHITECTURES to override'.
   * Also it is initialized by env var [CUDAARCHS](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) and picked up from there.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1374967973


##########
cmake/modules/CUDA.cmake:
##########
@@ -64,6 +64,7 @@ if(USE_CUDA)
     message(STATUS "Build with Thrust support")
     cmake_minimum_required(VERSION 3.13) # to compile CUDA code
     enable_language(CUDA)
+    set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   The is due to an odd behavior of cmake that I don't understand. If I remove this and set `USE_THRUST`, I get
   ```
   CMake Error in CMakeLists.txt:
     CUDA_ARCHITECTURES is empty for target "tvm_runtime_objs".
   ```
   . Previously I didn't encounter this problem when I was using `USE_THRUST` (a couple of years ago).
   
   This doesn't apply to `vllm.cmake` for some reason. But there, if I don't have `set(CMAKE_CUDA_ARCHITECTURES "80;75")`, I get 
   ```
   error   : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher
   ```
   during build. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "cbalint13 (via GitHub)" <gi...@apache.org>.

cbalint13 commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1375053386


##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   masahi 
   
   CMAKE_CUDA_ARCHITECTURES is designed to be eprovided by user at cmake config time.
   * There are also values like 'auto', 'all' see [CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html)
   * Better throw a warning like: 'Using 'all' for CUDAARCHS, use CMAKE_CUDA_ARCHITECTURES to override'.
   * Also it is initialized by env var [CUDAARCHS](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) and picked up from there.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "yelite (via GitHub)" <gi...@apache.org>.

yelite commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1374552066


##########
cmake/modules/CUDA.cmake:
##########
@@ -64,6 +64,7 @@ if(USE_CUDA)
     message(STATUS "Build with Thrust support")
     cmake_minimum_required(VERSION 3.13) # to compile CUDA code
     enable_language(CUDA)
+    set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   Why do we want to explicitly set the cuda arch here?



##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   Should it let users set the cuda arch they want in `config.cmake` instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1374972747


##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   Not all users (including me) use `config.cmake`. I think building for sm 80 and 75 should be fine in practice (newer arch can use sm80 code). We can add 70 if we still want to support v100. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "cbalint13 (via GitHub)" <gi...@apache.org>.

cbalint13 commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1375052987


##########
cmake/modules/CUDA.cmake:
##########
@@ -64,6 +64,7 @@ if(USE_CUDA)
     message(STATUS "Build with Thrust support")
     cmake_minimum_required(VERSION 3.13) # to compile CUDA code
     enable_language(CUDA)
+    set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   @masahi 
   
   CMAKE_CUDA_ARCHITECTURES is designed to be eprovided by user at cmake config time.
   * There are also values like 'auto', 'all' see [CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html)
   * Better throw a warning like: 'Using 'all' for CUDAARCHS, use CMAKE_CUDA_ARCHITECTURES to override'.
   * Also it is initialized by env var [CUDAARCHS](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) and picked up from there.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi closed pull request #15995: [Unity][Contrib] Add vLLM paged attention kernel
URL: https://github.com/apache/tvm/pull/15995


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "starrkk (via GitHub)" <gi...@apache.org>.

starrkk commented on PR #15995:
URL: https://github.com/apache/tvm/pull/15995#issuecomment-1831060640

   @masahi  Thank you very much for sharing the open source cause, your pr is very meaningful to me, can you continue to integrate your code
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1374972747


##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   Not all users (including me) use `config.cmake`. I think building for sm 80 and 75 should be enough in practice (newer arch can use sm80 code). We can add 70 if we still want to support v100. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "cbalint13 (via GitHub)" <gi...@apache.org>.

cbalint13 commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1375053386


##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   @masahi
   
   The [CMAKE_CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) is designed to be provided by user at cmake config time.
   * There are also values like 'auto', 'all' see [CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html)
   * Better throw a warning like: 'Using 'all' for CUDAARCHS, use CMAKE_CUDA_ARCHITECTURES to override'.
   * Also it is initialized by env var [CUDAARCHS](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) and picked up from there.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [Unity][Contrib] Add vLLM paged attention kernel [tvm]

Posted by "cbalint13 (via GitHub)" <gi...@apache.org>.

cbalint13 commented on code in PR #15995:
URL: https://github.com/apache/tvm/pull/15995#discussion_r1375053386


##########
cmake/modules/contrib/vllm.cmake:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+if(USE_VLLM)
+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")

Review Comment:
   @masahi
   
   CMAKE_CUDA_ARCHITECTURES is designed to be eprovided by user at cmake config time.
   * There are also values like 'auto', 'all' see [CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html)
   * Better throw a warning like: 'Using 'all' for CUDAARCHS, use CMAKE_CUDA_ARCHITECTURES to override'.
   * Also it is initialized by env var [CUDAARCHS](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) and picked up from there.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org