You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/18 09:54:12 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8485: ARROW-10002: [Rust] Remove trait specialization

jorgecarleitao opened a new pull request #8485:
URL: https://github.com/apache/arrow/pull/8485


   This PR removes trait specialization by leveraging the compiler to remove trivial `if` statements over generics.
   
   I verified that the assembly code was the same in a [simple example](https://rust.godbolt.org/z/qrcW8W). I do not know if this generalizes to our use-case, but I suspect so as LLVM is (hopefully) removing code such as `if a != a`.
   
   The change `get_data_type()` to `DATA_TYPE` is not necessary. I did it before realizing this. IMO it makes it more explicit that this is not a function, but a constant, but we can revert it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507184621



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       Nice! I used a similar trick for the simd sum aggregation that the compiler was also able to optimize away (https://github.com/apache/arrow/pull/8370/files#diff-53a8522c4a482c9566b9032e1bd2c7990bab29640857cb40411beb7158189e75R625)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507072708



##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -424,7 +424,7 @@ impl FilterContext {
     where
         T: ArrowNumericType,
     {
-        let array_type = T::get_data_type();
+        let array_type = T::DATA_TYPE;

Review comment:
       this could actually be removed by making `filter_array_impl` a generic over `T`. I left for a future exercise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#issuecomment-711144061


   fyi @andygrove @alamb 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507070902



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -684,16 +670,6 @@ impl PrimitiveArray<BooleanType> {
     }
 }
 
-impl fmt::Debug for PrimitiveArray<BooleanType> {

Review comment:
       No longer needed, thus removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507072865



##########
File path: rust/arrow/src/lib.rs
##########
@@ -124,7 +124,6 @@
 //!
 //! The parquet implementation is on a [separate crate](https://crates.io/crates/parquet)
 
-#![feature(specialization)]

Review comment:
       🎉 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507199740



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -562,105 +617,36 @@ where
     ///
     /// `Date32` and `Date64` return UTC midnight as they do not have time resolution
     pub fn value_as_time(&self, i: usize) -> Option<NaiveTime> {
-        match self.data_type() {
-            DataType::Time32(unit) => {
-                // safe to immediately cast to u32 as `self.value(i)` is positive i32
-                let v = i64::from(self.value(i)) as u32;
-                match unit {
-                    TimeUnit::Second => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(v, 0))
-                    }
-                    TimeUnit::Millisecond => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(
-                            // extract seconds from milliseconds
-                            v / MILLISECONDS as u32,
-                            // discard extracted seconds and convert milliseconds to
-                            // nanoseconds
-                            v % MILLISECONDS as u32 * MICROSECONDS as u32,
-                        ))
-                    }
-                    _ => None,
-                }
-            }
-            DataType::Time64(unit) => {
-                let v = i64::from(self.value(i));
-                match unit {
-                    TimeUnit::Microsecond => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(
-                            // extract seconds from microseconds
-                            (v / MICROSECONDS) as u32,
-                            // discard extracted seconds and convert microseconds to
-                            // nanoseconds
-                            (v % MICROSECONDS * MILLISECONDS) as u32,
-                        ))
-                    }
-                    TimeUnit::Nanosecond => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(
-                            // extract seconds from nanoseconds
-                            (v / NANOSECONDS) as u32,
-                            // discard extracted seconds
-                            (v % NANOSECONDS) as u32,
-                        ))
-                    }
-                    _ => None,
-                }
-            }
-            DataType::Timestamp(_, _) => {
-                self.value_as_datetime(i).map(|datetime| datetime.time())
-            }
-            DataType::Date32(_) | DataType::Date64(_) => {
-                Some(NaiveTime::from_hms(0, 0, 0))
-            }
-            DataType::Interval(_) => None,
-            _ => None,
-        }
+        as_time::<T>(i64::from(self.value(i)))
     }
 }
 
 impl<T: ArrowPrimitiveType> fmt::Debug for PrimitiveArray<T> {
-    default fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "PrimitiveArray<{:?}>\n[\n", T::get_data_type())?;
-        print_long_array(self, f, |array, index, f| {
-            fmt::Debug::fmt(&array.value(index), f)
-        })?;
-        write!(f, "]")
-    }
-}
-
-impl<T: ArrowNumericType> fmt::Debug for PrimitiveArray<T> {
-    default fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "PrimitiveArray<{:?}>\n[\n", T::get_data_type())?;
-        print_long_array(self, f, |array, index, f| {
-            fmt::Debug::fmt(&array.value(index), f)
-        })?;
-        write!(f, "]")
-    }
-}
-
-impl<T: ArrowNumericType + ArrowTemporalType> fmt::Debug for PrimitiveArray<T>
-where
-    i64: std::convert::From<T::Native>,
-{
     fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "PrimitiveArray<{:?}>\n[\n", T::get_data_type())?;
-        print_long_array(self, f, |array, index, f| match T::get_data_type() {
+        write!(f, "PrimitiveArray<{:?}>\n[\n", T::DATA_TYPE)?;
+        print_long_array(self, f, |array, index, f| match T::DATA_TYPE {
             DataType::Date32(_) | DataType::Date64(_) => {
-                match array.value_as_date(index) {

Review comment:
       The journey to usize via `num-traits`, then to i64, feels long, but I see the compiler optimises the function nicely in release mode.
   
   ```rust
   pub extern fn convert_u32_to_i64(value: u32) -> i64 {
       num::ToPrimitive::to_usize(&value).unwrap() as i64
   }
   ```
   
   ```asm
   playground::convert_u32_to_i64:
   	movl	%edi, %eax
   	retq
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r535140816



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       @jorgecarleitao 
   
   I see this in profiling for the BufferBuilderTrait append function:
   
   ![image](https://user-images.githubusercontent.com/163737/101012554-4c1dfc00-3563-11eb-9b00-f92eac06b7ae.png)
   
   Which accounts for 9% of the instruction fetches according to callgrind when running the TPHC 12 benchmark.
   
   So it seems the `if T::DATA_TYPE == DataType::Boolean` is not being optimized out and goes into the partial eq implementation (or the profiler is wrong), maybe because the implementation there is not easy to optimize / specialize for this case. I will check if another implementation (pattern match?) solves this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#issuecomment-711145304


   https://issues.apache.org/jira/browse/ARROW-10002


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8485:
URL: https://github.com/apache/arrow/pull/8485


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507071022



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -898,7 +874,7 @@ impl From<Vec<Option<bool>>> for BooleanArray {
 
 /// Constructs a `PrimitiveArray` from an array data reference.
 impl<T: ArrowPrimitiveType> From<ArrayDataRef> for PrimitiveArray<T> {
-    default fn from(data: ArrayDataRef) -> Self {

Review comment:
       one of the core changes of this PR...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#issuecomment-711159868


   > For the sake of creating less friction at the user side, can you make get_data_type const fn with the same name rather than const with capitals? Still it will generate exactly the same code at the call site.
   
   I think this won't be possible as functions in traits can't be const


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507071721



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       Note that after templating, this will be either `if a == a` or `if a == b`, which I am hoping the compiler to optimize out. I placed an `inline` to help the compiler a bit, but I am not sure it is really needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507072431



##########
File path: rust/arrow/src/array/equal.rs
##########
@@ -43,11 +43,15 @@ pub trait ArrayEqual {
 }
 
 impl<T: ArrowPrimitiveType> ArrayEqual for PrimitiveArray<T> {
-    default fn equals(&self, other: &dyn Array) -> bool {
+    fn equals(&self, other: &dyn Array) -> bool {
         if !base_equal(&self.data(), &other.data()) {
             return false;
         }
 
+        if T::DATA_TYPE == DataType::Boolean {

Review comment:
       Here we use the same principle: this will be a constant `if a == a` or `if a == b`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r535140816



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       @jorgecarleitao 
   
   I see this in profiling for the BufferBuilderTrait append function:
   
   ![image](https://user-images.githubusercontent.com/163737/101012554-4c1dfc00-3563-11eb-9b00-f92eac06b7ae.png)
   
   Which accounts for ~9% of the instruction fetches according to callgrind when running the TPHC 12 benchmark.
   
   So it seems the `if T::DATA_TYPE == DataType::Boolean` is not being optimized out and goes into the partial eq implementation (or the profiler is wrong), maybe because the implementation there is not easy to optimize / specialize for this case. I will check if another implementation (pattern match?) solves this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507075762



##########
File path: rust/arrow/src/lib.rs
##########
@@ -124,7 +124,6 @@
 //!
 //! The parquet implementation is on a [separate crate](https://crates.io/crates/parquet)
 
-#![feature(specialization)]

Review comment:
       #arewestableyet? This is great, I've just compiled Arrow on stable Rust. I think it's just parquet that still uses it after this PR. If DataFusion doesn't use any nightly features, one could make parquet optional, and then Parquet will be the only one left.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507071459



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {

Review comment:
       Here, this implementation now takes care of both numerics and booleans, using `T::DATA_TYPE` to branch between them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507071721



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       Note that after templating, this will be either `if a == a` or `if a != a`, which I am hoping the compiler to optimize out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#issuecomment-711156241


   For the sake of creating less friction at the user side, can you make get_data_type const fn with the same name rather than const with capitals? Still it will generate exactly the same code at the call site.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507070561



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -502,6 +502,98 @@ impl<T: ArrowNumericType> PrimitiveArray<T> {
     }
 }
 
+fn as_datetime<T: ArrowPrimitiveType>(v: i64) -> Option<NaiveDateTime> {

Review comment:
       This places these functions outside the `impl`, so that we can use them:
   
   When used with `ArrowNumericType`, it can convert itself to i64 and thus this is simple.
   
   When we use it with `PrimitiveType`, we use `to_usize` to convert to i64.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507071721



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       Note that after templating, this will be either `if a == a` or `if a != a`, which I am hoping the compiler to optimize out. I placed an `inline` to help the compiler a bit, but I am not sure it is really needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#issuecomment-711277364


   @nevi-me true :/. We go to datatype from generics. const fn also accepts only `Sized` any other trait bound won't work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r535140816



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       @jorgecarleitao 
   
   I see this in profiling for BufferBuilderTrait append:
   
   ![image](https://user-images.githubusercontent.com/163737/101012554-4c1dfc00-3563-11eb-9b00-f92eac06b7ae.png)
   
   Which accounts for 9% of the instruction fetches according to callgrind when running the TPHC 12 benchmark.
   
   So it seems the `if T::DATA_TYPE == DataType::Boolean` is not being optimized out and goes into the partial eq implementation (or the profiler is wrong), maybe because the implementation there is not easy to optimize / specialize for this case. I will check if another implementation (pattern match?) solves this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507094172



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -562,105 +617,36 @@ where
     ///
     /// `Date32` and `Date64` return UTC midnight as they do not have time resolution
     pub fn value_as_time(&self, i: usize) -> Option<NaiveTime> {
-        match self.data_type() {
-            DataType::Time32(unit) => {
-                // safe to immediately cast to u32 as `self.value(i)` is positive i32
-                let v = i64::from(self.value(i)) as u32;
-                match unit {
-                    TimeUnit::Second => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(v, 0))
-                    }
-                    TimeUnit::Millisecond => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(
-                            // extract seconds from milliseconds
-                            v / MILLISECONDS as u32,
-                            // discard extracted seconds and convert milliseconds to
-                            // nanoseconds
-                            v % MILLISECONDS as u32 * MICROSECONDS as u32,
-                        ))
-                    }
-                    _ => None,
-                }
-            }
-            DataType::Time64(unit) => {
-                let v = i64::from(self.value(i));
-                match unit {
-                    TimeUnit::Microsecond => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(
-                            // extract seconds from microseconds
-                            (v / MICROSECONDS) as u32,
-                            // discard extracted seconds and convert microseconds to
-                            // nanoseconds
-                            (v % MICROSECONDS * MILLISECONDS) as u32,
-                        ))
-                    }
-                    TimeUnit::Nanosecond => {
-                        Some(NaiveTime::from_num_seconds_from_midnight(
-                            // extract seconds from nanoseconds
-                            (v / NANOSECONDS) as u32,
-                            // discard extracted seconds
-                            (v % NANOSECONDS) as u32,
-                        ))
-                    }
-                    _ => None,
-                }
-            }
-            DataType::Timestamp(_, _) => {
-                self.value_as_datetime(i).map(|datetime| datetime.time())
-            }
-            DataType::Date32(_) | DataType::Date64(_) => {
-                Some(NaiveTime::from_hms(0, 0, 0))
-            }
-            DataType::Interval(_) => None,
-            _ => None,
-        }
+        as_time::<T>(i64::from(self.value(i)))
     }
 }
 
 impl<T: ArrowPrimitiveType> fmt::Debug for PrimitiveArray<T> {
-    default fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "PrimitiveArray<{:?}>\n[\n", T::get_data_type())?;
-        print_long_array(self, f, |array, index, f| {
-            fmt::Debug::fmt(&array.value(index), f)
-        })?;
-        write!(f, "]")
-    }
-}
-
-impl<T: ArrowNumericType> fmt::Debug for PrimitiveArray<T> {
-    default fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "PrimitiveArray<{:?}>\n[\n", T::get_data_type())?;
-        print_long_array(self, f, |array, index, f| {
-            fmt::Debug::fmt(&array.value(index), f)
-        })?;
-        write!(f, "]")
-    }
-}
-
-impl<T: ArrowNumericType + ArrowTemporalType> fmt::Debug for PrimitiveArray<T>
-where
-    i64: std::convert::From<T::Native>,
-{
     fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "PrimitiveArray<{:?}>\n[\n", T::get_data_type())?;
-        print_long_array(self, f, |array, index, f| match T::get_data_type() {
+        write!(f, "PrimitiveArray<{:?}>\n[\n", T::DATA_TYPE)?;
+        print_long_array(self, f, |array, index, f| match T::DATA_TYPE {
             DataType::Date32(_) | DataType::Date64(_) => {
-                match array.value_as_date(index) {

Review comment:
       so the primary difference here is that all types get converted to i64 and then converted back to the appropriate (potentially smaller) native type by the code (and the compiler is presumably smart enough to make it all happen efficiently)
   
   And by structuring the code in that way, we only need a single type parameter (i64->T rather than all combinations of input and output types?)

##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -253,8 +253,18 @@ pub trait BufferBuilderTrait<T: ArrowPrimitiveType> {
 }
 
 impl<T: ArrowPrimitiveType> BufferBuilderTrait<T> for BufferBuilder<T> {
-    default fn new(capacity: usize) -> Self {
-        let buffer = MutableBuffer::new(capacity * mem::size_of::<T::Native>());
+    #[inline]
+    fn new(capacity: usize) -> Self {
+        let buffer = if T::DATA_TYPE == DataType::Boolean {

Review comment:
       I think using an `if` for specialization rather than the compiler is totally legit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8485: ARROW-10002: [Rust] Remove trait specialization from arrow crate

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8485:
URL: https://github.com/apache/arrow/pull/8485#discussion_r507075762



##########
File path: rust/arrow/src/lib.rs
##########
@@ -124,7 +124,6 @@
 //!
 //! The parquet implementation is on a [separate crate](https://crates.io/crates/parquet)
 
-#![feature(specialization)]

Review comment:
       #arewestableyet? This is great. I think it's just parquet that still uses it after this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org