You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/25 08:50:30 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8510: ARROW-1614: [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

jorisvandenbossche commented on a change in pull request #8510:
URL: https://github.com/apache/arrow/pull/8510#discussion_r791482403



##########
File path: cpp/src/arrow/extension_type_test.cc
##########
@@ -333,4 +334,144 @@ TEST_F(TestExtensionType, ValidateExtensionArray) {
   ASSERT_OK(ext_arr4->ValidateFull());
 }
 
+class TensorArray : public ExtensionArray {
+ public:
+  using ExtensionArray::ExtensionArray;
+};
+
+class TensorArrayType : public ExtensionType {
+ public:
+  explicit TensorArrayType(const std::shared_ptr<DataType>& type,
+                           const std::vector<int64_t>& shape,
+                           const std::vector<int64_t>& strides)
+      : ExtensionType(type), type_(type), shape_(shape), strides_(strides) {}
+
+  std::shared_ptr<DataType> type() const { return type_; }
+  std::vector<int64_t> shape() const { return shape_; }
+  std::vector<int64_t> strides() const { return strides_; }
+
+  std::string extension_name() const override {
+    std::stringstream s;
+    s << "ext-array-tensor-type<type=" << *storage_type() << ", shape=(";
+    for (uint64_t i = 0; i < shape_.size(); i++) {
+      s << shape_[i];
+      if (i < shape_.size() - 1) {
+        s << ", ";
+      }
+    }
+    s << "), strides=(";
+    for (uint64_t i = 0; i < strides_.size(); i++) {
+      s << strides_[i];
+      if (i < strides_.size() - 1) {
+        s << ", ";
+      }
+    }
+    s << ")>";
+    return s.str();
+  }
+
+  bool ExtensionEquals(const ExtensionType& other) const override {
+    return this->shape() == static_cast<const TensorArrayType&>(other).shape();

Review comment:
       > I removed the `ndim` check and made equality comparison only check for type name (`ext-array-tensor-type`).
   
   I don't really understand the argument about not being strict in the type equality. 
   
   Currently, the example code here uses `FixedSizeListArray` as the underlying storage type, for _constant_ dimension (and I would say constant shape and strides, as well?) for the full array. 
   
   So if you have multiple files in the same dataset where the shape of the tensors differs between files, how can that ever be combined in a single schema / table? FixedSizeListArray with a different size are not compatible with each other. 
   
   If you want to be flexible regarding the shape of each tensor value, it seems that is rather what is covered in https://issues.apache.org/jira/browse/ARROW-8714, where the storage type could be a struct with one field the data as a variable sized list array and the other field storing the shape of each tensor value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org