You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/03/19 22:58:24 UTC

[GitHub] [tvm] electriclilies opened a new pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

electriclilies opened a new pull request #7710:
URL: https://github.com/apache/tvm/pull/7710


   In this PR, I introduce the DataLoader class. The DataLoader is an abstract class intended to wrap datasets from other machine learning frameworks so that they can be used interchangeably within TVM for any data-aware tasks. 
   
   I also provide three implementations of the DataLoader class: TFDataLoader NumpyDataLoader, and RandomDataLoader. 
   
   The TFDataLoader wraps Tensorflow datasets. 
   The NumpyDataLoader wraps numpy arrays of data in NCHW form (where N is the total number of datapoints). Keras datasets provide data in this form-- the Numpy DataLoader is intended for use with Keras datasets, but could also be used with any other dataset stored in a similar fashion. 
   The RandomDataLoader takes in a list of shapes and produces random outputs that correspond to those shapes. This class is useful for testing code, especially if you are not at a point where you want to go to the effort of downloading a real dataset.
   
   The DataLoader class was originally designed for data-aware quantization. I think it could also be useful for training, and for making accuracy testing scripts more robust and general.
   
   @mbrookhart @jwfromm @altanh Please take a look and let me know what you think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen edited a comment on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-803329092


   Thanks @electriclilies give this is an intro of a new feature, it would be great to send an RFC. 
   
   There are quite a lot of previous practices (e.g. PyTorch, MXNet and TF's dataset API). It would be great to compare from these and make least amount of deviations so users can have a consistent usage experience per https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen edited a comment on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-803329092


   Thanks @electriclilies !
   
   There are quite a lot of previous practices (e.g. PyTorch, MXNet and TF's dataset API). It would be great to compare from these and make least amount of deviations so users can have a consistent usage experience per https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures ("Be consistent with existing well-known package’s APIs if the features overlap. ")
   
   Give this is an intro of a new feature, it would be great to send an RFC for broader discussion as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen edited a comment on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-803329092


   Thanks @electriclilies give this is an intro of a new feature, it would be great to send an RFC. 
   
   There are quite a lot of previous practices (e.g. PyTorch, MXNet and TF's dataset API). It would be great to compare from these and make least amount of deviations so users can have a consistent usage experience per https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures ("Be consistent with existing well-known package’s APIs if the features overlap. ")


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] electriclilies edited a comment on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
electriclilies edited a comment on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-805370118


   @tqchen @jroesch @mbroohart @anijain2305 I put up an RFC, please take a look: 
   https://discuss.tvm.apache.org/t/dataloader-an-api-to-wrap-datasets-from-other-machine-learning-frameworks/9498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] electriclilies edited a comment on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
electriclilies edited a comment on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-805370118


   @tqchen @jroesch @mbrookhart @anijain2305 I put up an RFC, please take a look: 
   https://discuss.tvm.apache.org/t/dataloader-an-api-to-wrap-datasets-from-other-machine-learning-frameworks/9498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jroesch commented on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
jroesch commented on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-803191303


   @altanh does this work for PT dataloading?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] electriclilies closed pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
electriclilies closed pull request #7710:
URL: https://github.com/apache/tvm/pull/7710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] electriclilies commented on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
electriclilies commented on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-805370118


   @tqchen @jroesch @anijain2305 I put up an RFC, please take a look: 
   https://discuss.tvm.apache.org/t/dataloader-an-api-to-wrap-datasets-from-other-machine-learning-frameworks/9498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] electriclilies commented on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
electriclilies commented on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-804282772


   @tqchen I'll get an RFC up soon, and @jroesch I can double check that this would work with PT data loaders.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen commented on pull request #7710: [DATA] DataLoader -- a universal interface for wrapping datasets from other machine learning frameworks

Posted by GitBox <gi...@apache.org>.
tqchen commented on pull request #7710:
URL: https://github.com/apache/tvm/pull/7710#issuecomment-803329092


   Thanks @electriclilies give this is an intro of a new feature, it would be great to send an RFC. 
   
   There are quite a lot of previous practices (e.g. PyTorch, MXNet and TF's dataset API). It would be great to compare from these and make least amount of deviations so users can have a consistent usage experience


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org