You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/11/18 10:15:15 UTC

[GitHub] [tvm] Alexey-Yazev opened a new pull request, #13428: [microNPU] Fix cascade scheduling stability

Alexey-Yazev opened a new pull request, #13428:
URL: https://github.com/apache/tvm/pull/13428

   The reason for allocating different amounts of memory from launch to launch was that when determining optimal proposals, there are elements in the collection with the same costs metrics and the first of these metrics becomes optimal and the rest are discarded. the problem was solved by adding an additional sorting condition by shapes from StripeConfigs in the case when the metrics match.
   
   cc @leandron @ekalda, @NicolaLancellotti


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Alexey-Yazev commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

Alexey-Yazev commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1329002401

   Without running the StorageRewrite pass (changes were merged in PR https://github.com/apache/tvm/pull/13365) amount of allocated memory is same from launch to launch despite the fact that different proposals are applied.
   
   
   > I suppose there can be two kinds of instability there:
   > (1) Choosing a different Pproposal from launch to launch. Even if the Proposals have same memory and cycle counts according to the cascader, the more accurate memory planner can give a differing results for Proposals with different topology
   > (2) We choose an identical Proposal every time, but the memory planner allocates different amount of memory for the same proposal. That sounds like a memory planner instability
   
   There is the first one and it happens if the StorageRewrite pass was run.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Alexey-Yazev commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

Alexey-Yazev commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1325235697

   I suppose checking for the equality of `allocated_size` and `workspace_size` in test_networks.py is incorrect as when using cascader with enabled striping a proposal is selected with condition `proposal.memory_usage < workspace_size`,  `allocated_size` and `proposal.memory_usage` are calculated differently (unified static memory planning is used to calculate `allocated_size` and `proposal.memory_usage` is calculated as the sum of all tensors, taking into account striping for  intermediate tensors)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Alexey-Yazev commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

Alexey-Yazev commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1334846024

   
   
   
   > Sorry for the delay on this - I don't think we should spend much time investigating the instability that results from using StorageRewrite since Ethos-U is intended to be run with the USMP, so debugging the internals of StorageRewrite seems a bit out of scope here.
   
   Thanks, I left only additional conditions for sorting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] ekalda merged pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

ekalda merged PR #13428:
URL: https://github.com/apache/tvm/pull/13428


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Alexey-Yazev commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

Alexey-Yazev commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1329064261

   For this pool request, will it be enough to add an additional parameter to sort Plans/Proposals or do I need to investigate problem with different memory allocations when running StorageRewrite pass?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] ekalda commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

ekalda commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1337155347

   Thanks @Alexey-Yazev!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

tvm-bot commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1319800501

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * cc @Mousius, @lhutton1 <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] ekalda commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

ekalda commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1326564893

   > > Thanks @Alexey-Yazev, looks good! :)
   > > What I gather is that the instability in the cascader comes from nondeterministic sorting when two Plans/Proposals have the same memory usage. It makes sense to me then to look at the cycle count as a differentiating metric. However, in the case where we have identical performance and memory use, I can't think of a reason why one of the Plans/Proposals should be advantageous of the other, so I wonder if this could be simplified by just removing one of the Plan or Proposal?
   > 
   > Thanks @ekalda! I agree that elements with the same metrics have no advantages over each other. It seems that the real problem is in calculation of metrics, since resulting proposal from launch to launch is obtained with the same metrics, but as a result different amount of memory is allocated. I'll try to figure it out.
   
   I suppose there can be two kinds of instability there:
   (1) Choosing a different Pproposal from launch to launch. Even if the Proposals have same memory and cycle counts according to the cascader, the more accurate memory planner can give a differing results for Proposals with different topology
   (2) We choose an identical Proposal every time, but the memory planner allocates different amount of memory for the same proposal. That sounds like a memory planner instability
   
   (A bit of a stab in the dark there)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] ekalda commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

ekalda commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1333865776

   > For this pull request, will it be enough to add an additional parameter to sort Plans/Proposals or do I need to investigate problem with different memory allocations when running StorageRewrite pass?
   
   Sorry for the delay on this - I don't think we should spend much time investigating the instability that results from using StorageRewrite since Ethos-U is intended to be run with the USMP, so debugging the internals of StorageRewrite seems a bit out of scope here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Alexey-Yazev commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

Alexey-Yazev commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1323811247

   > Thanks @Alexey-Yazev, looks good! :)
   > 
   > What I gather is that the instability in the cascader comes from nondeterministic sorting when two Plans/Proposals have the same memory usage. It makes sense to me then to look at the cycle count as a differentiating metric. However, in the case where we have identical performance and memory use, I can't think of a reason why one of the Plans/Proposals should be advantageous of the other, so I wonder if this could be simplified by just removing one of the Plan or Proposal?
   
   Thanks @ekalda! 
   I agree that elements with the same metrics have no advantages over each other. It seems that the real problem is in calculation of metrics, since resulting proposal from launch to launch is obtained with the same metrics, but as a result different amount of memory is allocated. I'll try to figure it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] ekalda commented on pull request #13428: [microNPU] Fix cascade scheduling stability

Posted by GitBox <gi...@apache.org>.

ekalda commented on PR #13428:
URL: https://github.com/apache/tvm/pull/13428#issuecomment-1326569004

   > I suppose checking for the equality of `allocated_size` and `workspace_size` in test_networks.py is incorrect as when using cascader with enabled striping a proposal is selected with condition `proposal.memory_usage < workspace_size`, `allocated_size` and `proposal.memory_usage` are calculated differently (unified static memory planning is used to calculate `allocated_size` and `proposal.memory_usage` is calculated as the sum of all tensors, taking into account striping for intermediate tensors)
   
   Yes, I think you are right, thinking about it, we can't really check for the equality of `allocated_size` and `workspace_size`. I suppose when we test for `allocated_size < workspace_size` we are checking that the Proposal we chose (based on `workspace_size`) still fits into the `workspace_size` once we have done memory planning on the resulting graph.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org