You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/02 18:53:06 UTC

[GitHub] [arrow] vertexclique commented on a change in pull request #6898: ARROW-8399: [Rust] Extend memory alignments to include other architectures

vertexclique commented on a change in pull request #6898:
URL: https://github.com/apache/arrow/pull/6898#discussion_r418994231



##########
File path: rust/arrow/src/memory.rs
##########
@@ -21,7 +21,58 @@
 use std::alloc::Layout;
 use std::mem::align_of;
 
-pub const ALIGNMENT: usize = 64;
+#[cfg(target_arch = "x86")]
+pub const ALIGNMENT: usize = (1 << 6);
+
+#[cfg(target_arch = "x86_64")]
+pub const ALIGNMENT: usize = (1 << 7);
+
+#[cfg(target_arch = "mips")]
+pub const ALIGNMENT: usize = (1 << 5);
+
+#[cfg(target_arch = "mips64")]
+pub const ALIGNMENT: usize = (1 << 5);
+
+#[cfg(target_arch = "powerpc")]
+pub const ALIGNMENT: usize = (1 << 5);
+
+#[cfg(target_arch = "powerpc64")]
+pub const ALIGNMENT: usize = (1 << 6);
+
+#[cfg(target_arch = "riscv")]
+pub const ALIGNMENT: usize = (1 << 6);
+
+#[cfg(target_arch = "s390x")]
+pub const ALIGNMENT: usize = (1 << 8);
+
+#[cfg(target_arch = "sparc")]
+pub const ALIGNMENT: usize = (1 << 5);
+
+#[cfg(target_arch = "sparc64")]
+pub const ALIGNMENT: usize = (1 << 6);
+
+#[cfg(target_arch = "thumbv6")]
+pub const ALIGNMENT: usize = (1 << 5);
+
+#[cfg(target_arch = "thumbv7")]
+pub const ALIGNMENT: usize = (1 << 5);
+
+#[cfg(target_arch = "wasm32")]
+pub const ALIGNMENT: usize = FALLBACK_ALIGNMENT;

Review comment:
       Because the wasm32 target does have an alignment of where it runs(where VM's arch is). In the current status quo of processors, it is mostly 64, and as you can see not necessarily in this PR.
   
   > and what does it mean when a fetch is marked as shared?
   
   When a fetch occurs at the processor level, it loads 128 words at a time for Intel for example. Not 64. So when the last one is not modulo of that, you will break the pipeline. These are pipeline breakers and slows downs the ram -> cache -> register loads by doing more round trips between cache and ram. This prefetcher is called spatial prefetcher and mostly exists in superscalar systems. That's why this alignment combined with contiguous allocated memory can bring everything to L2 without doing extra round. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org