You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/07 21:07:15 UTC

[GitHub] [arrow] elferherrera opened a new pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

elferherrera opened a new pull request #8129:
URL: https://github.com/apache/arrow/pull/8129


   The provided shape and stride for the tensor should be checked before creating a tensor. The shape and the stride will be used to access the elements in the buffer when working with the tensor


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-689052590


   I think a way to connect the tensor to ndarry would be great. But I most confess I am more interested in connecting it to pytorch. It would be great to have a way to create tensors in rust and send them to pytorch. Although, I dont think pytorch has arrow implemented yet. 
   
   Let me ask in the mailing list to see what would be recommendation for tensor struct. I think a team in go have done a tensor implementation using [arrow](https://blog.gopheracademy.com/advent-2018/go-arrow/).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paddyhoran commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
paddyhoran commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-689244517


   I would love to see integrations (zero copy conversion) with lots of other libs, I assume you are aware of [tch-rs](https://github.com/LaurentMazare/tch-rs).
   
   If you look at the [C++ impl](https://github.com/apache/arrow/tree/master/cpp/src/arrow/tensor), which is the most mature, all the tensor functionality is focused on integration with other frameworks (rather than tensor specific functionality in Arrow).  This might just be that no-one has decided to work on it yet though.
   
   > Is the team interested in more features for the tensor struct?
   
   What kind of features did you have in mind? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on a change in pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on a change in pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#discussion_r484591510



##########
File path: rust/arrow/src/tensor.rs
##########
@@ -112,13 +123,44 @@ impl<'a, T: ArrowPrimitiveType> Tensor<'a, T> {
                         )
                     })
                     .next();
+
+                let total_elements: usize = s.iter().product();
+                assert_eq!(
+                    total_elements, 
+                    buffer.len() / mem::size_of::<T::Native>(),
+                    "number of elements in buffer does not match dimensios"
+                );
             }
         };
+
+        // Checking that the tensor strides used for construction are correct
+        // otherwise a row major stride is calculated and used as value for the tensor
+        let tensor_strides = {
+            if let Some(val) = strides {
+                if let Some(ref s) = shape {
+                    if compute_row_major_strides::<T>(s) == val || compute_column_major_strides::<T>(s) == val {
+                        Some(val)
+
+                    } else {
+                        panic!("the input stride does not match the selected shape")

Review comment:
       Sure. That's a good idea. I could implement it like that. I was also wondering if it is worthy to implement other type of constructors, something similar to the ones you can find in pytorch. I think strides shouldn't be a parameter that the user has to input, but instead a parameter that is calculated and modified internally. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-688847933


   I was considering moving the tensor.rs file into its own folder and use this folder for all the things necessary for the tensor. I think we should define an error enum for the tensor because I dont think the ArrowError enum will be useful for the tensor struct. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-693126288


   @elferherrera may you please run `cargo +stable fmt` to fix the formatting issues, and then rebase against the latest branch currently called 'master'. Happy to help out, but I quickly tried rebasing and struggled a bit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-694312499


   > Hi. I just corrected the format and did a rebase. I had to do it with cargo +nightly fmt. By the way, why is arrow using nightly and not stable?
   
   On the nightly question. we still use specialization, but coincidentally there's someone working on finding alternatives. The SIMD feature also requires nightly, but it's off by default IIRC.
   
   Did you rebase with `git rebase origin/master`? Your rebase seems to have pulled in other merged PRs, I'll try help you out with it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-694367787


   `git remote add apache https://github.com/Apache/arrow`, then use `apache` instead of `origin` for the rebase


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-694374399


   Thanks.. let me set my environment like that. I appreciate the help. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-688550811


   Is the team interested in more features for the tensor struct? I have some ideas that could be implemented but I'm not sure if you are looking for more implementation on that class or should we have another crate that uses this struct and implements traits on it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paddyhoran commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
paddyhoran commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-689047476


   The tensor type was originally introduced to aid in interop with existing ML frameworks on the Python side.  I was exploring interop with [ndarray](https://github.com/rust-ndarray/ndarray/issues/771) but I have not had time to explore further.
   
   I would be interested in the communities view of the tensor type.  i.e. do we plan to build tools to operate on tensors within Arrow or should it remain an "interop type".  Maybe ask this on the mailing list where others on the C++ side could give their opinion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8129:
URL: https://github.com/apache/arrow/pull/8129


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8129:
URL: https://github.com/apache/arrow/pull/8129


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#discussion_r484571618



##########
File path: rust/arrow/src/tensor.rs
##########
@@ -112,13 +123,44 @@ impl<'a, T: ArrowPrimitiveType> Tensor<'a, T> {
                         )
                     })
                     .next();
+
+                let total_elements: usize = s.iter().product();
+                assert_eq!(
+                    total_elements, 
+                    buffer.len() / mem::size_of::<T::Native>(),
+                    "number of elements in buffer does not match dimensios"
+                );
             }
         };
+
+        // Checking that the tensor strides used for construction are correct
+        // otherwise a row major stride is calculated and used as value for the tensor
+        let tensor_strides = {
+            if let Some(val) = strides {
+                if let Some(ref s) = shape {
+                    if compute_row_major_strides::<T>(s) == val || compute_column_major_strides::<T>(s) == val {
+                        Some(val)
+
+                    } else {
+                        panic!("the input stride does not match the selected shape")

Review comment:
       Perhaps we could rename `new` to `try_new` and have it return `Result<Self>` rather than panic?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-694316603


   @nevi-me yes, I did a rebase to master before pushing the branch. But I'm not sure if I did it right. I did on my master "git pull --ff-only upstream master" and then I rebase my tensor_checks branch to master. Is this correct?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-693698042


   Hi. I just corrected the format and did a rebase. I had to do it with cargo +nightly fmt. By the way, why is arrow using nightly and not stable?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-694318976


   Hmm, the way that's been recommended to me for this repo is `git rebase origin master`, assuming `origin` is apache/arrow. Perhaps a fast-forward creates a merge instead of a rebase, but I'm not familiar with what your command would do in detail


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-694320093


   mmm... but how do you set up origin as apache/arrow when you forked the project?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-688514484


   https://issues.apache.org/jira/browse/ARROW-9934


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] elferherrera commented on pull request #8129: ARROW-9934 [Rust] Shape and stride check in tensor

Posted by GitBox <gi...@apache.org>.
elferherrera commented on pull request #8129:
URL: https://github.com/apache/arrow/pull/8129#issuecomment-689647404


   I wasn't aware of tch-rs. I will have a look at it and how it creates the torch tensors. Thanks for the info.
   
   Regarding the features, I was planning to implement the fmt::debug trait for the tensor in order to see the real values and not the buffer. I was also considering new constructors similar to the ones you can find in pytorch, e.g. zeros, ones, rand, etc. Also to define operations on the tensor. 
   
   However, after reading the C++ implementation the best approach would be to try to implement arrow to ndarray (zero copy). There is no point to reinvent the wheel. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org