You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Fan Liya <li...@gmail.com> on 2019/04/28 09:49:46 UTC

[DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Hi all,

We are proposing a new set of APIs in Arrow - unsafe vector APIs. The
general ideas is attached below, and also accessible from our online
document
<https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing>.
Please give your valuable comments by directly commenting in our online
document
<https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing>,
or relaying this email thread.

Thank you so much in advance.

Best,
Liya Fan

Support Fast/Unsafe Vector APIs for Arrow Background

In our effort to support columnar data format in Apache Flink, we chose
Apache Arrow as the basic data structure. Arrow greatly simplifies the
support of the columnar data format. However, for many scenarios, we find
the performance unacceptable. Our investigation shows the reason is that,
there are too many redundant checks and computations in current Arrow API.



For example, the following figures shows that in a single call to
Float8Vector.get(int) method (this is one of the most frequently used APIs
in Flink computation),  there are 20+ method invocations.


[image: image.png]





There are many other APIs with similar problems. The redundant checks and
computations impact performance severely. According to our evaluation, the
performance may degrade by two or three orders of magnitude.
Our Proposal

For many scenarios, the checks can be avoided, if the application
developers can guarantee that all checks will pass. So our proposal is to
provide some light-weight APIs. The APIs are also named *unsafe APIs*, in
the sense that that skip most of the checks (not safe) to improve the
performance.



In the light-weight APIs, we only provide minimum checks, or avoid checks
at all. The application owner can still develop and debug their code using
the original safe APIs. Once all bugs have been fixed, they can switch to
unsafe APIs in the final version of their products and enjoy the high
performance.
Our Design

Our goal is to include unsafe vector APIs in Arrow code base, and allow our
customers switching to the new unsafe APIs, without being aware of it,
except for the high performance. To achieve this goal, we make the
following design choices:
Vector Class Hierarchy

Each unsafe vector is the subclass of the safe vector. For example, the
unsafe Float8Vector is a subclass of org.apache.arrow.vector.Float8Vector:



package org.apache.arrow.vector.unsafe;



public class Float8Vector extends org.apache.arrow.vector.Float8Vector



So the safe vector acts as a façade of the unsafe vector, and through
polymorphism, the users may not be aware of which type of vector he/she is
working with. In addition, the common logics can be reused in the unsafe
vectors, and we only need to override get/set related methods.
Vector Creation

We use factory methods to create each type of vectors. Compared with vector
constructors, the factory methods take one more parameter, the vectorType:



public class VectorFactory {

  public static Float8Vector createFloat8Vector(VectorType vectorType,
String name, BufferAllocator allocator);

}



VectorType is an enum to separate safe vectors from unsafe ones:



public enum VectorType {

  SAFE,

  UNSAFE

}



With the factory methods, the old way of creating vectors by constructors
can be gradually depreciated.
Vector Implementation

As discussed above, unsafe vectors mainly override get/set methods. For get
methods, we directly operate on the off-heap memory, without any check:



public double get(int index) {

    return
Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
+ (index << TYPE_LOG2_WIDTH)));

}



Note that the PlatformDependent API is only 2 stack layers above the
underlying UNSAFE method call.



For set methods, we still need to set the validity bit. However, this is
through an unsafe method that directly sets the bits without checking:



         public void set(int index, double value) {

      UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer, index);

PlatformDependent.putLong(

            valueBuffer.memoryAddress() + (index << TYPE_LOG2_WIDTH),
Double.doubleToRawLongBits(value));

}



Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe version of
BitVectorHelper.setValidityBitToOne that avoids checks.


Test Cases

We can reuse existing test cases by employing parameterized test classes to
test both safe and unsafe vectors.
Current Progress

We have opened a JIRA for this work item FlINK-5200
<https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
<https://github.com/apache/arrow/pull/4212> with initial implementations
have been opened. We would appreciate if you could give some comments.

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi Micah,

Thanks a lot for your comments. You solution sounds reasonable.
We have opened a JIRA yesterday (ARROW-5290
<https://issues.apache.org/jira/browse/ARROW-5290>), for the static boolean
wrapper you mentioned (This was also suggested by Jacques).

Hopefully, it will solve the problem.

Best,
Liya Fan

On Fri, May 10, 2019 at 12:11 PM Micah Kornfield <em...@gmail.com>
wrote:

> Hi Liya Fan and Wes,
> TL;DR; I think we can either close
> https://issues.apache.org/jira/browse/ARROW-1833 or repurpose with a
> slightly different implementation proposed on one of the open pull requests
> [1].
>
> The new approach will add another final static boolean wrapper class (like
> the memory bounds checking [2]) to turn off the validation against the null
> bitmap ArrowBuf (via isSet) inside get* for use-cases that require the
> extra performance.
>
> This means no additional methods should need to be introduced.  Based on
> the numbers above it seem like this will have a non-trivial positive impact
> at the microbenchmark level.
>
> It is up to the caller to decide if they need to call isSet before (and can
> avoid it if the null-count is zero), but that is orthogonal.
>
> Thanks,
> Micah
>
> [1] https://github.com/apache/arrow/pull/4258
> [2]
>
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
>
> On Thu, May 9, 2019 at 7:22 PM Fan Liya <li...@gmail.com> wrote:
>
> > Hi Wes,
> >
> > I think the problem for ArrowBuf can be resolved by
> > disabling BoundsChecking.BOUNDS_CHECKING_ENABLED.
> > For example, this is the code of getInt:
> >
> >   public int getInt(int index) {
> >     chk(index, INT_SIZE);
> >     return PlatformDependent.getInt(addr(index));
> >   }
> >
> > The chk method makes bound check, which can be turned off by
> > BoundsChecking.BOUNDS_CHECKING_ENABLED.
> > I do not see null checking in ArrowBuf. Maybe you are talking about
> another
> > buffer class?
> >
> > Best,
> > Liya Fan
> >
> > On Thu, May 9, 2019 at 9:39 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > It has also been previously suggested to add a get* method that
> > > returns the value in the ArrowBuf without null checking, like
> > > getDirty. See
> > >
> > > https://issues.apache.org/jira/browse/ARROW-1833
> > >
> > > Any thoughts about that?
> > >
> > > On Thu, May 9, 2019 at 4:54 AM niki.lj <ni...@aliyun.com.invalid>
> > wrote:
> > > >
> > > > +1 on this proposal.
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > 发件人：Fan Liya <li...@gmail.com>
> > > > 发送时间：2019年5月9日(星期四) 16:33
> > > > 收件人：dev <de...@arrow.apache.org>
> > > > 主 题：Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow
> > > >
> > > > Hi all,
> > > >
> > > > Our previous results on micro-benchmarks show that, the original
> Arrow
> > > API
> > > > is 30% slower than the unsafe API.
> > > > After profiling, we found that, the performance overhead comes from
> the
> > > > null-checking in the get method. For example, the get method of
> > > > Float8Vector looks like this:
> > > >
> > > >   public double get(int index) throws IllegalStateException {
> > > >     if (isSet(index) == 0) {
> > > >       throw new IllegalStateException("Value at index is null");
> > > >     }
> > > >     return valueBuffer.getDouble(index * TYPE_WIDTH);
> > > >   }
> > > >
> > > > It first makes sure the value is not null, and then retrieve the
> value.
> > > >
> > > > In some cases, the first check is redundant, because the application
> > code
> > > > usually do the check before calling the get method. For such cases,
> the
> > > > first check can be skipped.
> > > > Therefore, @Jacques Nadeau suggests adding another flag to
> > enable/disable
> > > > such check. I think this is a good suggestion, because it solves the
> > > > performance problem, without introducing a new set of vector classes.
> > > What
> > > > do you think?
> > > >
> > > > I have opened a JIRA for that (ARROW-5290
> > > > <https://issues.apache.org/jira/browse/ARROW-5290>). Please give
> your
> > > > valuable comments.
> > > > Thanks a lot for your attention and valuable comments.
> > > > Special thanks to @Jacques Nadeau for all the suggestions and helpful
> > > > comments.
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, May 8, 2019 at 1:05 PM Fan Liya <li...@gmail.com>
> wrote:
> > > >
> > > > > Hi Jacques,
> > > > >
> > > > > Thanks a lot for your comments.
> > > > >
> > > > > I have evaluated the assembly code of original Arrow API, as well
> as
> > > the
> > > > > unsafe API in our PR <https://github.com/apache/arrow/pull/4212>
> > > > > Generally, the assembly code generated by JIT for both APIs are of
> > high
> > > > > quality, and for most cases, the assembly code are almost the same.
> > > > >
> > > > > However, some checks can be further removed. The following figures
> > > give an
> > > > > example (the figures are too big to be attached, so I have attached
> > > them in
> > > > > a JIRA comment. Please see comment
> > > > > <
> > >
> >
> https://issues.apache.org/jira/browse/ARROW-5200?focusedCommentId=16835303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16835303
> > >.
> > > Sorry
> > > > > for the inconvenience):
> > > > >
> > > > > The first figure shows the code of original Arrow API, while the
> > second
> > > > > shows the code for the unsafe API.
> > > > > It can be observed that for the unsafe API, the amounts of the
> > source,
> > > > > byte and assembly code are all smaller. So it can be expected that
> > the
> > > > > performance of unsafe API is better.
> > > > >
> > > > > Concerning this particular example for the Float8Vector, I think it
> > is
> > > > > reasonable to further remove the check in the get method:
> > > > > Before we call the get method, we must check if the value is null,
> so
> > > the
> > > > > check in the get method is redundant. And this is a typical
> scenario
> > of
> > > > > using Arrow API (check and then get), at least for our scenario
> > (please
> > > > > take a glimpse of our benchmark in PR
> > > > > <https://github.com/apache/arrow/pull/4198>).
> > > > >
> > > > > Concerning the other problem, about the real algorithm in our
> > > scenario. I
> > > > > want to make two points:
> > > > >
> > > > > 1. SQL engines are performance critical, so 30% is a large number
> for
> > > us.
> > > > > For the past year, it took our team several months just to improve
> > the
> > > > > performance of our runtime engine by around 15%.
> > > > >
> > > > > 2. The performance of engine heavily depends on the performance of
> > > Arrow.
> > > > > Most SQL engines are memory-intensive, so the performance of
> get/set
> > > > > methods is the key. To get a flavor of the algorithms in our
> engine,
> > > please
> > > > > refer to PR <https://github.com/apache/arrow/pull/4198>. That is
> the
> > > core
> > > > > algorithm of our operator, which is executed many times during the
> > > > > processing of a SQL query. You can find that the computation is
> > > relatively
> > > > > simple, and most method calls are memory accesses.
> > > > >
> > > > > Best,
> > > > > Liya Fan
> > > > >
> > > > > On Mon, May 6, 2019 at 5:52 PM Jacques Nadeau <ja...@apache.org>
> > > wrote:
> > > > >
> > > > >> I am still asking the same question: can you please analyze the
> > > assembly
> > > > >> the JIT is producing and look to identify why the disabled bounds
> > > checking
> > > > >> is at 30% and what types of things we can do to address. For
> > example,
> > > we
> > > > >> have talked before about a bytecode transformer that simply
> removes
> > > the
> > > > >> bounds checking when loading Arrow if you want that behavior. If
> > > > >> necessary,
> > > > >> that may be a big win from a code maintenance standpoint over
> having
> > > > >> duplicate interfaces.
> > > > >>
> > > > >> The static block seems like a non-problem. You could simply load
> > > another
> > > > >> class that system property before loading any Arrow code. If
> you're
> > > > >> proposing a code change to solve your problem today, this seems
> just
> > > as
> > > > >> feasible.
> > > > >>
> > > > >> The other question: in a real algorithm, how much does that 30%
> > > matter?
> > > > >> Your benchmarks are entirely about this one call whereas real
> > > workloads
> > > > >> are
> > > > >> impacted by many things and the time in writing/reading vectors is
> > > > >> miniscule versus other things.
> > > > >>
> > > > >> On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com>
> > wrote:
> > > > >>
> > > > >> > Hi Jacques,
> > > > >> >
> > > > >> > Thank you so much for your kind reminder.
> > > > >> >
> > > > >> > To come up with some performance data, I have set up an
> > environment
> > > and
> > > > >> > run some micro-benchmarks.
> > > > >> > The server runs Linux, has 64 cores and has 256 GB memory.
> > > > >> > The benchmarks are simple iterations over some double vectors
> (the
> > > > >> source
> > > > >> > file is attached):
> > > > >> >
> > > > >> >   @Benchmark
> > > > >> >   @BenchmarkMode(Mode.AverageTime)
> > > > >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> > > > >> >   public void testSafe() {
> > > > >> >     safeSum = 0;
> > > > >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> > > > >> >       safeVector.set(i, i + 10.0);
> > > > >> >       safeSum += safeVector.get(i);
> > > > >> >     }
> > > > >> >   }
> > > > >> >
> > > > >> >   @Benchmark
> > > > >> >   @BenchmarkMode(Mode.AverageTime)
> > > > >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> > > > >> >   public void testUnSafe() {
> > > > >> >     unSafeSum = 0;
> > > > >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> > > > >> >       unsafeVector.set(i, i + 10.0);
> > > > >> >       unSafeSum += unsafeVector.get(i);
> > > > >> >     }
> > > > >> >   }
> > > > >> >
> > > > >> > The safe vector in the testSafe benchmark is from the original
> > Arrow
> > > > >> > implementation, whereas the unsafe vector in the testUnsafe
> > > benchmark is
> > > > >> > based on our initial implementation in PR
> > > > >> > <https://github.com/apache/arrow/pull/4212> (This is not the
> > final
> > > > >> > version. However, we believe much overhead has been removed).
> > > > >> > The evaluation is based on JMH framework (thanks to the
> suggestion
> > > from
> > > > >> > Jacques Nadeau). The benchmarks are run so many times by the
> > > framework
> > > > >> that
> > > > >> > the effects of JIT are well considered.
> > > > >> >
> > > > >> > In the first experiment, we use the default configuration
> > (boundary
> > > > >> > checking enabled), and the original Arrow vector is about 4
> times
> > > > >> slower:
> > > > >> >
> > > > >> > Benchmark                       Mode  Cnt   Score   Error  Units
> > > > >> > VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
> > > > >> > VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
> > > > >> >
> > > > >> > In the second experiment, we disable the boundary checking by
> JVM
> > > > >> options:
> > > > >> >
> > > > >> > -Ddrill.enable_unsafe_memory_access=true
> > > > >> > -Darrow.enable_unsafe_memory_access=true
> > > > >> >
> > > > >> > This time, the original Arrow vector is about 30% slower:
> > > > >> >
> > > > >> > Benchmark                       Mode  Cnt  Score   Error  Units
> > > > >> > VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
> > > > >> > VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
> > > > >> >
> > > > >> > This is a significant improvement, about 2.84x faster compared
> to
> > > when
> > > > >> > bound checking is enabled.
> > > > >> > However, in our scenario, we would still chose to bypass Arrow
> > APIs
> > > > >> > without hesitation, because such memory accesses are so frequent
> > > > >> > operations, that a 30% performance degradation will easily cause
> > us
> > > lose
> > > > >> > edge.
> > > > >> >
> > > > >> > The results can be attributed to the following factors:
> > > > >> > 1. Although the checks have been disabled, we still need to read
> > the
> > > > >> flag
> > > > >> > and check it repeatedly in the Arrow APIs, which accumulates to
> > > large
> > > > >> > performance overhead.
> > > > >> > 2. There is too much code in the call stacks, compared with the
> > > unsafe
> > > > >> > API. This will lead to less efficient i-cache, even if JIT can
> > > avoids
> > > > >> the
> > > > >> > cost of stack frames by in-lining most method code.
> > > > >> >
> > > > >> > Another, maybe separate problem is that, the
> > > > >> > flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and
> > > initialized in
> > > > >> a
> > > > >> > static block. That means the only reliable way to override it is
> > to
> > > > >> > override system properties in the JVM command line. However, for
> > > some
> > > > >> > scenarios, we do not have access to the command line (e.g.
> running
> > > > >> Flink in
> > > > >> > Yarn). I think this deserves a separate issue.
> > > > >> >
> > > > >> > Best,
> > > > >> > Liya Fan
> > > > >> >
> > > > >> > On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <
> jacques@apache.org
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> >
> > > > >> >> > Maybe I need to take a closer look at how the other SQL
> engines
> > > are
> > > > >> >> using
> > > > >> >> > Arrow. To see if they are also bypassing Arrow APIs.
> > > > >> >> > I agree that a random user should be able to protect
> > themselves,
> > > and
> > > > >> >> this
> > > > >> >> > is the utmost priority.
> > > > >> >> >
> > > > >> >> > According to my experience in Flink, JIT cannot optimize away
> > the
> > > > >> >> checks,
> > > > >> >> > and removing the checks addresses the issue.
> > > > >> >> > I want to illustrate this from two points:
> > > > >> >> >
> > > > >> >> > 1. Theoretical view point: JIT makes optimizations without
> > > changing
> > > > >> >> > semantics of the code, so it can never remove the checks
> > without
> > > > >> >> changing
> > > > >> >> > code semantics. To make it simple, if the JIT has witness the
> > > engine
> > > > >> >> > successfully processed 1,000,000 records, how can it be sure
> > > that the
> > > > >> >> > 1,000,001th record will be successful?
> > > > >> >> >
> > > > >> >> > 2. Practical view point: we have evaluated our SQL engine on
> > > TPC-H
> > > > >> 1TB
> > > > >> >> data
> > > > >> >> > set. This is really a large number of records. So the JIT
> must
> > > have
> > > > >> done
> > > > >> >> > all it could to improve the code. According to the
> performance
> > > > >> results,
> > > > >> >> > however, it could not eliminate the impact caused checks.
> > > > >> >> >
> > > > >> >>
> > > > >> >> I don't think you're following my point. There are two
> different
> > > > >> points it
> > > > >> >> seems like you want to discuss. Let's evaluate each separately:
> > > > >> >>
> > > > >> >> 1) Bounds checking for safety
> > > > >> >> 2) Supposed inefficiency of the call hierarchy.
> > > > >> >>
> > > > >> >> For #1 we provide a system level property that can disable
> these.
> > > The
> > > > >> JVM
> > > > >> >> should succesfully optimize away this operation if that flag is
> > > set.
> > > > >> >> Please
> > > > >> >> look at the JIT output to confirm whether this is true.
> > > > >> >>
> > > > >> >> For #2: We designed things to collapse so the call hierarchy
> > > shouldn't
> > > > >> be
> > > > >> >> a
> > > > >> >> problem. Please look at the JIT output to confirm.
> > > > >> >>
> > > > >> >> Please come with data around #1 and #2 to make an argument for
> a
> > > set of
> > > > >> >> changes.
> > > > >> >>
> > > > >> >> thanks
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >
> > >
> >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Micah Kornfield <em...@gmail.com>.

Hi Liya Fan and Wes,
TL;DR; I think we can either close
https://issues.apache.org/jira/browse/ARROW-1833 or repurpose with a
slightly different implementation proposed on one of the open pull requests
[1].

The new approach will add another final static boolean wrapper class (like
the memory bounds checking [2]) to turn off the validation against the null
bitmap ArrowBuf (via isSet) inside get* for use-cases that require the
extra performance.

This means no additional methods should need to be introduced.  Based on
the numbers above it seem like this will have a non-trivial positive impact
at the microbenchmark level.

It is up to the caller to decide if they need to call isSet before (and can
avoid it if the null-count is zero), but that is orthogonal.

Thanks,
Micah

[1] https://github.com/apache/arrow/pull/4258
[2]
https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java

On Thu, May 9, 2019 at 7:22 PM Fan Liya <li...@gmail.com> wrote:

> Hi Wes,
>
> I think the problem for ArrowBuf can be resolved by
> disabling BoundsChecking.BOUNDS_CHECKING_ENABLED.
> For example, this is the code of getInt:
>
>   public int getInt(int index) {
>     chk(index, INT_SIZE);
>     return PlatformDependent.getInt(addr(index));
>   }
>
> The chk method makes bound check, which can be turned off by
> BoundsChecking.BOUNDS_CHECKING_ENABLED.
> I do not see null checking in ArrowBuf. Maybe you are talking about another
> buffer class?
>
> Best,
> Liya Fan
>
> On Thu, May 9, 2019 at 9:39 PM Wes McKinney <we...@gmail.com> wrote:
>
> > It has also been previously suggested to add a get* method that
> > returns the value in the ArrowBuf without null checking, like
> > getDirty. See
> >
> > https://issues.apache.org/jira/browse/ARROW-1833
> >
> > Any thoughts about that?
> >
> > On Thu, May 9, 2019 at 4:54 AM niki.lj <ni...@aliyun.com.invalid>
> wrote:
> > >
> > > +1 on this proposal.
> > >
> > >
> > > ------------------------------------------------------------------
> > > 发件人：Fan Liya <li...@gmail.com>
> > > 发送时间：2019年5月9日(星期四) 16:33
> > > 收件人：dev <de...@arrow.apache.org>
> > > 主 题：Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow
> > >
> > > Hi all,
> > >
> > > Our previous results on micro-benchmarks show that, the original Arrow
> > API
> > > is 30% slower than the unsafe API.
> > > After profiling, we found that, the performance overhead comes from the
> > > null-checking in the get method. For example, the get method of
> > > Float8Vector looks like this:
> > >
> > >   public double get(int index) throws IllegalStateException {
> > >     if (isSet(index) == 0) {
> > >       throw new IllegalStateException("Value at index is null");
> > >     }
> > >     return valueBuffer.getDouble(index * TYPE_WIDTH);
> > >   }
> > >
> > > It first makes sure the value is not null, and then retrieve the value.
> > >
> > > In some cases, the first check is redundant, because the application
> code
> > > usually do the check before calling the get method. For such cases, the
> > > first check can be skipped.
> > > Therefore, @Jacques Nadeau suggests adding another flag to
> enable/disable
> > > such check. I think this is a good suggestion, because it solves the
> > > performance problem, without introducing a new set of vector classes.
> > What
> > > do you think?
> > >
> > > I have opened a JIRA for that (ARROW-5290
> > > <https://issues.apache.org/jira/browse/ARROW-5290>). Please give your
> > > valuable comments.
> > > Thanks a lot for your attention and valuable comments.
> > > Special thanks to @Jacques Nadeau for all the suggestions and helpful
> > > comments.
> > >
> > > Best,
> > > Liya Fan
> > >
> > >
> > >
> > >
> > > On Wed, May 8, 2019 at 1:05 PM Fan Liya <li...@gmail.com> wrote:
> > >
> > > > Hi Jacques,
> > > >
> > > > Thanks a lot for your comments.
> > > >
> > > > I have evaluated the assembly code of original Arrow API, as well as
> > the
> > > > unsafe API in our PR <https://github.com/apache/arrow/pull/4212>
> > > > Generally, the assembly code generated by JIT for both APIs are of
> high
> > > > quality, and for most cases, the assembly code are almost the same.
> > > >
> > > > However, some checks can be further removed. The following figures
> > give an
> > > > example (the figures are too big to be attached, so I have attached
> > them in
> > > > a JIRA comment. Please see comment
> > > > <
> >
> https://issues.apache.org/jira/browse/ARROW-5200?focusedCommentId=16835303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16835303
> >.
> > Sorry
> > > > for the inconvenience):
> > > >
> > > > The first figure shows the code of original Arrow API, while the
> second
> > > > shows the code for the unsafe API.
> > > > It can be observed that for the unsafe API, the amounts of the
> source,
> > > > byte and assembly code are all smaller. So it can be expected that
> the
> > > > performance of unsafe API is better.
> > > >
> > > > Concerning this particular example for the Float8Vector, I think it
> is
> > > > reasonable to further remove the check in the get method:
> > > > Before we call the get method, we must check if the value is null, so
> > the
> > > > check in the get method is redundant. And this is a typical scenario
> of
> > > > using Arrow API (check and then get), at least for our scenario
> (please
> > > > take a glimpse of our benchmark in PR
> > > > <https://github.com/apache/arrow/pull/4198>).
> > > >
> > > > Concerning the other problem, about the real algorithm in our
> > scenario. I
> > > > want to make two points:
> > > >
> > > > 1. SQL engines are performance critical, so 30% is a large number for
> > us.
> > > > For the past year, it took our team several months just to improve
> the
> > > > performance of our runtime engine by around 15%.
> > > >
> > > > 2. The performance of engine heavily depends on the performance of
> > Arrow.
> > > > Most SQL engines are memory-intensive, so the performance of get/set
> > > > methods is the key. To get a flavor of the algorithms in our engine,
> > please
> > > > refer to PR <https://github.com/apache/arrow/pull/4198>. That is the
> > core
> > > > algorithm of our operator, which is executed many times during the
> > > > processing of a SQL query. You can find that the computation is
> > relatively
> > > > simple, and most method calls are memory accesses.
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > > On Mon, May 6, 2019 at 5:52 PM Jacques Nadeau <ja...@apache.org>
> > wrote:
> > > >
> > > >> I am still asking the same question: can you please analyze the
> > assembly
> > > >> the JIT is producing and look to identify why the disabled bounds
> > checking
> > > >> is at 30% and what types of things we can do to address. For
> example,
> > we
> > > >> have talked before about a bytecode transformer that simply removes
> > the
> > > >> bounds checking when loading Arrow if you want that behavior. If
> > > >> necessary,
> > > >> that may be a big win from a code maintenance standpoint over having
> > > >> duplicate interfaces.
> > > >>
> > > >> The static block seems like a non-problem. You could simply load
> > another
> > > >> class that system property before loading any Arrow code. If you're
> > > >> proposing a code change to solve your problem today, this seems just
> > as
> > > >> feasible.
> > > >>
> > > >> The other question: in a real algorithm, how much does that 30%
> > matter?
> > > >> Your benchmarks are entirely about this one call whereas real
> > workloads
> > > >> are
> > > >> impacted by many things and the time in writing/reading vectors is
> > > >> miniscule versus other things.
> > > >>
> > > >> On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com>
> wrote:
> > > >>
> > > >> > Hi Jacques,
> > > >> >
> > > >> > Thank you so much for your kind reminder.
> > > >> >
> > > >> > To come up with some performance data, I have set up an
> environment
> > and
> > > >> > run some micro-benchmarks.
> > > >> > The server runs Linux, has 64 cores and has 256 GB memory.
> > > >> > The benchmarks are simple iterations over some double vectors (the
> > > >> source
> > > >> > file is attached):
> > > >> >
> > > >> >   @Benchmark
> > > >> >   @BenchmarkMode(Mode.AverageTime)
> > > >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> > > >> >   public void testSafe() {
> > > >> >     safeSum = 0;
> > > >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> > > >> >       safeVector.set(i, i + 10.0);
> > > >> >       safeSum += safeVector.get(i);
> > > >> >     }
> > > >> >   }
> > > >> >
> > > >> >   @Benchmark
> > > >> >   @BenchmarkMode(Mode.AverageTime)
> > > >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> > > >> >   public void testUnSafe() {
> > > >> >     unSafeSum = 0;
> > > >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> > > >> >       unsafeVector.set(i, i + 10.0);
> > > >> >       unSafeSum += unsafeVector.get(i);
> > > >> >     }
> > > >> >   }
> > > >> >
> > > >> > The safe vector in the testSafe benchmark is from the original
> Arrow
> > > >> > implementation, whereas the unsafe vector in the testUnsafe
> > benchmark is
> > > >> > based on our initial implementation in PR
> > > >> > <https://github.com/apache/arrow/pull/4212> (This is not the
> final
> > > >> > version. However, we believe much overhead has been removed).
> > > >> > The evaluation is based on JMH framework (thanks to the suggestion
> > from
> > > >> > Jacques Nadeau). The benchmarks are run so many times by the
> > framework
> > > >> that
> > > >> > the effects of JIT are well considered.
> > > >> >
> > > >> > In the first experiment, we use the default configuration
> (boundary
> > > >> > checking enabled), and the original Arrow vector is about 4 times
> > > >> slower:
> > > >> >
> > > >> > Benchmark                       Mode  Cnt   Score   Error  Units
> > > >> > VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
> > > >> > VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
> > > >> >
> > > >> > In the second experiment, we disable the boundary checking by JVM
> > > >> options:
> > > >> >
> > > >> > -Ddrill.enable_unsafe_memory_access=true
> > > >> > -Darrow.enable_unsafe_memory_access=true
> > > >> >
> > > >> > This time, the original Arrow vector is about 30% slower:
> > > >> >
> > > >> > Benchmark                       Mode  Cnt  Score   Error  Units
> > > >> > VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
> > > >> > VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
> > > >> >
> > > >> > This is a significant improvement, about 2.84x faster compared to
> > when
> > > >> > bound checking is enabled.
> > > >> > However, in our scenario, we would still chose to bypass Arrow
> APIs
> > > >> > without hesitation, because such memory accesses are so frequent
> > > >> > operations, that a 30% performance degradation will easily cause
> us
> > lose
> > > >> > edge.
> > > >> >
> > > >> > The results can be attributed to the following factors:
> > > >> > 1. Although the checks have been disabled, we still need to read
> the
> > > >> flag
> > > >> > and check it repeatedly in the Arrow APIs, which accumulates to
> > large
> > > >> > performance overhead.
> > > >> > 2. There is too much code in the call stacks, compared with the
> > unsafe
> > > >> > API. This will lead to less efficient i-cache, even if JIT can
> > avoids
> > > >> the
> > > >> > cost of stack frames by in-lining most method code.
> > > >> >
> > > >> > Another, maybe separate problem is that, the
> > > >> > flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and
> > initialized in
> > > >> a
> > > >> > static block. That means the only reliable way to override it is
> to
> > > >> > override system properties in the JVM command line. However, for
> > some
> > > >> > scenarios, we do not have access to the command line (e.g. running
> > > >> Flink in
> > > >> > Yarn). I think this deserves a separate issue.
> > > >> >
> > > >> > Best,
> > > >> > Liya Fan
> > > >> >
> > > >> > On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <jacques@apache.org
> >
> > > >> wrote:
> > > >> >
> > > >> >> >
> > > >> >> > Maybe I need to take a closer look at how the other SQL engines
> > are
> > > >> >> using
> > > >> >> > Arrow. To see if they are also bypassing Arrow APIs.
> > > >> >> > I agree that a random user should be able to protect
> themselves,
> > and
> > > >> >> this
> > > >> >> > is the utmost priority.
> > > >> >> >
> > > >> >> > According to my experience in Flink, JIT cannot optimize away
> the
> > > >> >> checks,
> > > >> >> > and removing the checks addresses the issue.
> > > >> >> > I want to illustrate this from two points:
> > > >> >> >
> > > >> >> > 1. Theoretical view point: JIT makes optimizations without
> > changing
> > > >> >> > semantics of the code, so it can never remove the checks
> without
> > > >> >> changing
> > > >> >> > code semantics. To make it simple, if the JIT has witness the
> > engine
> > > >> >> > successfully processed 1,000,000 records, how can it be sure
> > that the
> > > >> >> > 1,000,001th record will be successful?
> > > >> >> >
> > > >> >> > 2. Practical view point: we have evaluated our SQL engine on
> > TPC-H
> > > >> 1TB
> > > >> >> data
> > > >> >> > set. This is really a large number of records. So the JIT must
> > have
> > > >> done
> > > >> >> > all it could to improve the code. According to the performance
> > > >> results,
> > > >> >> > however, it could not eliminate the impact caused checks.
> > > >> >> >
> > > >> >>
> > > >> >> I don't think you're following my point. There are two different
> > > >> points it
> > > >> >> seems like you want to discuss. Let's evaluate each separately:
> > > >> >>
> > > >> >> 1) Bounds checking for safety
> > > >> >> 2) Supposed inefficiency of the call hierarchy.
> > > >> >>
> > > >> >> For #1 we provide a system level property that can disable these.
> > The
> > > >> JVM
> > > >> >> should succesfully optimize away this operation if that flag is
> > set.
> > > >> >> Please
> > > >> >> look at the JIT output to confirm whether this is true.
> > > >> >>
> > > >> >> For #2: We designed things to collapse so the call hierarchy
> > shouldn't
> > > >> be
> > > >> >> a
> > > >> >> problem. Please look at the JIT output to confirm.
> > > >> >>
> > > >> >> Please come with data around #1 and #2 to make an argument for a
> > set of
> > > >> >> changes.
> > > >> >>
> > > >> >> thanks
> > > >> >>
> > > >> >
> > > >>
> > > >
> >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi Wes,

I think the problem for ArrowBuf can be resolved by
disabling BoundsChecking.BOUNDS_CHECKING_ENABLED.
For example, this is the code of getInt:

  public int getInt(int index) {
    chk(index, INT_SIZE);
    return PlatformDependent.getInt(addr(index));
  }

The chk method makes bound check, which can be turned off by
BoundsChecking.BOUNDS_CHECKING_ENABLED.
I do not see null checking in ArrowBuf. Maybe you are talking about another
buffer class?

Best,
Liya Fan

On Thu, May 9, 2019 at 9:39 PM Wes McKinney <we...@gmail.com> wrote:

> It has also been previously suggested to add a get* method that
> returns the value in the ArrowBuf without null checking, like
> getDirty. See
>
> https://issues.apache.org/jira/browse/ARROW-1833
>
> Any thoughts about that?
>
> On Thu, May 9, 2019 at 4:54 AM niki.lj <ni...@aliyun.com.invalid> wrote:
> >
> > +1 on this proposal.
> >
> >
> > ------------------------------------------------------------------
> > 发件人：Fan Liya <li...@gmail.com>
> > 发送时间：2019年5月9日(星期四) 16:33
> > 收件人：dev <de...@arrow.apache.org>
> > 主 题：Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow
> >
> > Hi all,
> >
> > Our previous results on micro-benchmarks show that, the original Arrow
> API
> > is 30% slower than the unsafe API.
> > After profiling, we found that, the performance overhead comes from the
> > null-checking in the get method. For example, the get method of
> > Float8Vector looks like this:
> >
> >   public double get(int index) throws IllegalStateException {
> >     if (isSet(index) == 0) {
> >       throw new IllegalStateException("Value at index is null");
> >     }
> >     return valueBuffer.getDouble(index * TYPE_WIDTH);
> >   }
> >
> > It first makes sure the value is not null, and then retrieve the value.
> >
> > In some cases, the first check is redundant, because the application code
> > usually do the check before calling the get method. For such cases, the
> > first check can be skipped.
> > Therefore, @Jacques Nadeau suggests adding another flag to enable/disable
> > such check. I think this is a good suggestion, because it solves the
> > performance problem, without introducing a new set of vector classes.
> What
> > do you think?
> >
> > I have opened a JIRA for that (ARROW-5290
> > <https://issues.apache.org/jira/browse/ARROW-5290>). Please give your
> > valuable comments.
> > Thanks a lot for your attention and valuable comments.
> > Special thanks to @Jacques Nadeau for all the suggestions and helpful
> > comments.
> >
> > Best,
> > Liya Fan
> >
> >
> >
> >
> > On Wed, May 8, 2019 at 1:05 PM Fan Liya <li...@gmail.com> wrote:
> >
> > > Hi Jacques,
> > >
> > > Thanks a lot for your comments.
> > >
> > > I have evaluated the assembly code of original Arrow API, as well as
> the
> > > unsafe API in our PR <https://github.com/apache/arrow/pull/4212>
> > > Generally, the assembly code generated by JIT for both APIs are of high
> > > quality, and for most cases, the assembly code are almost the same.
> > >
> > > However, some checks can be further removed. The following figures
> give an
> > > example (the figures are too big to be attached, so I have attached
> them in
> > > a JIRA comment. Please see comment
> > > <
> https://issues.apache.org/jira/browse/ARROW-5200?focusedCommentId=16835303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16835303>.
> Sorry
> > > for the inconvenience):
> > >
> > > The first figure shows the code of original Arrow API, while the second
> > > shows the code for the unsafe API.
> > > It can be observed that for the unsafe API, the amounts of the source,
> > > byte and assembly code are all smaller. So it can be expected that the
> > > performance of unsafe API is better.
> > >
> > > Concerning this particular example for the Float8Vector, I think it is
> > > reasonable to further remove the check in the get method:
> > > Before we call the get method, we must check if the value is null, so
> the
> > > check in the get method is redundant. And this is a typical scenario of
> > > using Arrow API (check and then get), at least for our scenario (please
> > > take a glimpse of our benchmark in PR
> > > <https://github.com/apache/arrow/pull/4198>).
> > >
> > > Concerning the other problem, about the real algorithm in our
> scenario. I
> > > want to make two points:
> > >
> > > 1. SQL engines are performance critical, so 30% is a large number for
> us.
> > > For the past year, it took our team several months just to improve the
> > > performance of our runtime engine by around 15%.
> > >
> > > 2. The performance of engine heavily depends on the performance of
> Arrow.
> > > Most SQL engines are memory-intensive, so the performance of get/set
> > > methods is the key. To get a flavor of the algorithms in our engine,
> please
> > > refer to PR <https://github.com/apache/arrow/pull/4198>. That is the
> core
> > > algorithm of our operator, which is executed many times during the
> > > processing of a SQL query. You can find that the computation is
> relatively
> > > simple, and most method calls are memory accesses.
> > >
> > > Best,
> > > Liya Fan
> > >
> > > On Mon, May 6, 2019 at 5:52 PM Jacques Nadeau <ja...@apache.org>
> wrote:
> > >
> > >> I am still asking the same question: can you please analyze the
> assembly
> > >> the JIT is producing and look to identify why the disabled bounds
> checking
> > >> is at 30% and what types of things we can do to address. For example,
> we
> > >> have talked before about a bytecode transformer that simply removes
> the
> > >> bounds checking when loading Arrow if you want that behavior. If
> > >> necessary,
> > >> that may be a big win from a code maintenance standpoint over having
> > >> duplicate interfaces.
> > >>
> > >> The static block seems like a non-problem. You could simply load
> another
> > >> class that system property before loading any Arrow code. If you're
> > >> proposing a code change to solve your problem today, this seems just
> as
> > >> feasible.
> > >>
> > >> The other question: in a real algorithm, how much does that 30%
> matter?
> > >> Your benchmarks are entirely about this one call whereas real
> workloads
> > >> are
> > >> impacted by many things and the time in writing/reading vectors is
> > >> miniscule versus other things.
> > >>
> > >> On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com> wrote:
> > >>
> > >> > Hi Jacques,
> > >> >
> > >> > Thank you so much for your kind reminder.
> > >> >
> > >> > To come up with some performance data, I have set up an environment
> and
> > >> > run some micro-benchmarks.
> > >> > The server runs Linux, has 64 cores and has 256 GB memory.
> > >> > The benchmarks are simple iterations over some double vectors (the
> > >> source
> > >> > file is attached):
> > >> >
> > >> >   @Benchmark
> > >> >   @BenchmarkMode(Mode.AverageTime)
> > >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> > >> >   public void testSafe() {
> > >> >     safeSum = 0;
> > >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> > >> >       safeVector.set(i, i + 10.0);
> > >> >       safeSum += safeVector.get(i);
> > >> >     }
> > >> >   }
> > >> >
> > >> >   @Benchmark
> > >> >   @BenchmarkMode(Mode.AverageTime)
> > >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> > >> >   public void testUnSafe() {
> > >> >     unSafeSum = 0;
> > >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> > >> >       unsafeVector.set(i, i + 10.0);
> > >> >       unSafeSum += unsafeVector.get(i);
> > >> >     }
> > >> >   }
> > >> >
> > >> > The safe vector in the testSafe benchmark is from the original Arrow
> > >> > implementation, whereas the unsafe vector in the testUnsafe
> benchmark is
> > >> > based on our initial implementation in PR
> > >> > <https://github.com/apache/arrow/pull/4212> (This is not the final
> > >> > version. However, we believe much overhead has been removed).
> > >> > The evaluation is based on JMH framework (thanks to the suggestion
> from
> > >> > Jacques Nadeau). The benchmarks are run so many times by the
> framework
> > >> that
> > >> > the effects of JIT are well considered.
> > >> >
> > >> > In the first experiment, we use the default configuration (boundary
> > >> > checking enabled), and the original Arrow vector is about 4 times
> > >> slower:
> > >> >
> > >> > Benchmark                       Mode  Cnt   Score   Error  Units
> > >> > VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
> > >> > VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
> > >> >
> > >> > In the second experiment, we disable the boundary checking by JVM
> > >> options:
> > >> >
> > >> > -Ddrill.enable_unsafe_memory_access=true
> > >> > -Darrow.enable_unsafe_memory_access=true
> > >> >
> > >> > This time, the original Arrow vector is about 30% slower:
> > >> >
> > >> > Benchmark                       Mode  Cnt  Score   Error  Units
> > >> > VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
> > >> > VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
> > >> >
> > >> > This is a significant improvement, about 2.84x faster compared to
> when
> > >> > bound checking is enabled.
> > >> > However, in our scenario, we would still chose to bypass Arrow APIs
> > >> > without hesitation, because such memory accesses are so frequent
> > >> > operations, that a 30% performance degradation will easily cause us
> lose
> > >> > edge.
> > >> >
> > >> > The results can be attributed to the following factors:
> > >> > 1. Although the checks have been disabled, we still need to read the
> > >> flag
> > >> > and check it repeatedly in the Arrow APIs, which accumulates to
> large
> > >> > performance overhead.
> > >> > 2. There is too much code in the call stacks, compared with the
> unsafe
> > >> > API. This will lead to less efficient i-cache, even if JIT can
> avoids
> > >> the
> > >> > cost of stack frames by in-lining most method code.
> > >> >
> > >> > Another, maybe separate problem is that, the
> > >> > flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and
> initialized in
> > >> a
> > >> > static block. That means the only reliable way to override it is to
> > >> > override system properties in the JVM command line. However, for
> some
> > >> > scenarios, we do not have access to the command line (e.g. running
> > >> Flink in
> > >> > Yarn). I think this deserves a separate issue.
> > >> >
> > >> > Best,
> > >> > Liya Fan
> > >> >
> > >> > On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <ja...@apache.org>
> > >> wrote:
> > >> >
> > >> >> >
> > >> >> > Maybe I need to take a closer look at how the other SQL engines
> are
> > >> >> using
> > >> >> > Arrow. To see if they are also bypassing Arrow APIs.
> > >> >> > I agree that a random user should be able to protect themselves,
> and
> > >> >> this
> > >> >> > is the utmost priority.
> > >> >> >
> > >> >> > According to my experience in Flink, JIT cannot optimize away the
> > >> >> checks,
> > >> >> > and removing the checks addresses the issue.
> > >> >> > I want to illustrate this from two points:
> > >> >> >
> > >> >> > 1. Theoretical view point: JIT makes optimizations without
> changing
> > >> >> > semantics of the code, so it can never remove the checks without
> > >> >> changing
> > >> >> > code semantics. To make it simple, if the JIT has witness the
> engine
> > >> >> > successfully processed 1,000,000 records, how can it be sure
> that the
> > >> >> > 1,000,001th record will be successful?
> > >> >> >
> > >> >> > 2. Practical view point: we have evaluated our SQL engine on
> TPC-H
> > >> 1TB
> > >> >> data
> > >> >> > set. This is really a large number of records. So the JIT must
> have
> > >> done
> > >> >> > all it could to improve the code. According to the performance
> > >> results,
> > >> >> > however, it could not eliminate the impact caused checks.
> > >> >> >
> > >> >>
> > >> >> I don't think you're following my point. There are two different
> > >> points it
> > >> >> seems like you want to discuss. Let's evaluate each separately:
> > >> >>
> > >> >> 1) Bounds checking for safety
> > >> >> 2) Supposed inefficiency of the call hierarchy.
> > >> >>
> > >> >> For #1 we provide a system level property that can disable these.
> The
> > >> JVM
> > >> >> should succesfully optimize away this operation if that flag is
> set.
> > >> >> Please
> > >> >> look at the JIT output to confirm whether this is true.
> > >> >>
> > >> >> For #2: We designed things to collapse so the call hierarchy
> shouldn't
> > >> be
> > >> >> a
> > >> >> problem. Please look at the JIT output to confirm.
> > >> >>
> > >> >> Please come with data around #1 and #2 to make an argument for a
> set of
> > >> >> changes.
> > >> >>
> > >> >> thanks
> > >> >>
> > >> >
> > >>
> > >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Wes McKinney <we...@gmail.com>.

It has also been previously suggested to add a get* method that
returns the value in the ArrowBuf without null checking, like
getDirty. See

https://issues.apache.org/jira/browse/ARROW-1833

Any thoughts about that?

On Thu, May 9, 2019 at 4:54 AM niki.lj <ni...@aliyun.com.invalid> wrote:
>
> +1 on this proposal.
>
>
> ------------------------------------------------------------------
> 发件人：Fan Liya <li...@gmail.com>
> 发送时间：2019年5月9日(星期四) 16:33
> 收件人：dev <de...@arrow.apache.org>
> 主　题：Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow
>
> Hi all,
>
> Our previous results on micro-benchmarks show that, the original Arrow API
> is 30% slower than the unsafe API.
> After profiling, we found that, the performance overhead comes from the
> null-checking in the get method. For example, the get method of
> Float8Vector looks like this:
>
>   public double get(int index) throws IllegalStateException {
>     if (isSet(index) == 0) {
>       throw new IllegalStateException("Value at index is null");
>     }
>     return valueBuffer.getDouble(index * TYPE_WIDTH);
>   }
>
> It first makes sure the value is not null, and then retrieve the value.
>
> In some cases, the first check is redundant, because the application code
> usually do the check before calling the get method. For such cases, the
> first check can be skipped.
> Therefore, @Jacques Nadeau suggests adding another flag to enable/disable
> such check. I think this is a good suggestion, because it solves the
> performance problem, without introducing a new set of vector classes. What
> do you think?
>
> I have opened a JIRA for that (ARROW-5290
> <https://issues.apache.org/jira/browse/ARROW-5290>). Please give your
> valuable comments.
> Thanks a lot for your attention and valuable comments.
> Special thanks to @Jacques Nadeau for all the suggestions and helpful
> comments.
>
> Best,
> Liya Fan
>
>
>
>
> On Wed, May 8, 2019 at 1:05 PM Fan Liya <li...@gmail.com> wrote:
>
> > Hi Jacques,
> >
> > Thanks a lot for your comments.
> >
> > I have evaluated the assembly code of original Arrow API, as well as the
> > unsafe API in our PR <https://github.com/apache/arrow/pull/4212>
> > Generally, the assembly code generated by JIT for both APIs are of high
> > quality, and for most cases, the assembly code are almost the same.
> >
> > However, some checks can be further removed. The following figures give an
> > example (the figures are too big to be attached, so I have attached them in
> > a JIRA comment. Please see comment
> > <https://issues.apache.org/jira/browse/ARROW-5200?focusedCommentId=16835303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16835303>. Sorry
> > for the inconvenience):
> >
> > The first figure shows the code of original Arrow API, while the second
> > shows the code for the unsafe API.
> > It can be observed that for the unsafe API, the amounts of the source,
> > byte and assembly code are all smaller. So it can be expected that the
> > performance of unsafe API is better.
> >
> > Concerning this particular example for the Float8Vector, I think it is
> > reasonable to further remove the check in the get method:
> > Before we call the get method, we must check if the value is null, so the
> > check in the get method is redundant. And this is a typical scenario of
> > using Arrow API (check and then get), at least for our scenario (please
> > take a glimpse of our benchmark in PR
> > <https://github.com/apache/arrow/pull/4198>).
> >
> > Concerning the other problem, about the real algorithm in our scenario. I
> > want to make two points:
> >
> > 1. SQL engines are performance critical, so 30% is a large number for us.
> > For the past year, it took our team several months just to improve the
> > performance of our runtime engine by around 15%.
> >
> > 2. The performance of engine heavily depends on the performance of Arrow.
> > Most SQL engines are memory-intensive, so the performance of get/set
> > methods is the key. To get a flavor of the algorithms in our engine, please
> > refer to PR <https://github.com/apache/arrow/pull/4198>. That is the core
> > algorithm of our operator, which is executed many times during the
> > processing of a SQL query. You can find that the computation is relatively
> > simple, and most method calls are memory accesses.
> >
> > Best,
> > Liya Fan
> >
> > On Mon, May 6, 2019 at 5:52 PM Jacques Nadeau <ja...@apache.org> wrote:
> >
> >> I am still asking the same question: can you please analyze the assembly
> >> the JIT is producing and look to identify why the disabled bounds checking
> >> is at 30% and what types of things we can do to address. For example, we
> >> have talked before about a bytecode transformer that simply removes the
> >> bounds checking when loading Arrow if you want that behavior. If
> >> necessary,
> >> that may be a big win from a code maintenance standpoint over having
> >> duplicate interfaces.
> >>
> >> The static block seems like a non-problem. You could simply load another
> >> class that system property before loading any Arrow code. If you're
> >> proposing a code change to solve your problem today, this seems just as
> >> feasible.
> >>
> >> The other question: in a real algorithm, how much does that 30% matter?
> >> Your benchmarks are entirely about this one call whereas real workloads
> >> are
> >> impacted by many things and the time in writing/reading vectors is
> >> miniscule versus other things.
> >>
> >> On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com> wrote:
> >>
> >> > Hi Jacques,
> >> >
> >> > Thank you so much for your kind reminder.
> >> >
> >> > To come up with some performance data, I have set up an environment and
> >> > run some micro-benchmarks.
> >> > The server runs Linux, has 64 cores and has 256 GB memory.
> >> > The benchmarks are simple iterations over some double vectors (the
> >> source
> >> > file is attached):
> >> >
> >> >   @Benchmark
> >> >   @BenchmarkMode(Mode.AverageTime)
> >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> >> >   public void testSafe() {
> >> >     safeSum = 0;
> >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> >> >       safeVector.set(i, i + 10.0);
> >> >       safeSum += safeVector.get(i);
> >> >     }
> >> >   }
> >> >
> >> >   @Benchmark
> >> >   @BenchmarkMode(Mode.AverageTime)
> >> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> >> >   public void testUnSafe() {
> >> >     unSafeSum = 0;
> >> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> >> >       unsafeVector.set(i, i + 10.0);
> >> >       unSafeSum += unsafeVector.get(i);
> >> >     }
> >> >   }
> >> >
> >> > The safe vector in the testSafe benchmark is from the original Arrow
> >> > implementation, whereas the unsafe vector in the testUnsafe benchmark is
> >> > based on our initial implementation in PR
> >> > <https://github.com/apache/arrow/pull/4212> (This is not the final
> >> > version. However, we believe much overhead has been removed).
> >> > The evaluation is based on JMH framework (thanks to the suggestion from
> >> > Jacques Nadeau). The benchmarks are run so many times by the framework
> >> that
> >> > the effects of JIT are well considered.
> >> >
> >> > In the first experiment, we use the default configuration (boundary
> >> > checking enabled), and the original Arrow vector is about 4 times
> >> slower:
> >> >
> >> > Benchmark                       Mode  Cnt   Score   Error  Units
> >> > VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
> >> > VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
> >> >
> >> > In the second experiment, we disable the boundary checking by JVM
> >> options:
> >> >
> >> > -Ddrill.enable_unsafe_memory_access=true
> >> > -Darrow.enable_unsafe_memory_access=true
> >> >
> >> > This time, the original Arrow vector is about 30% slower:
> >> >
> >> > Benchmark                       Mode  Cnt  Score   Error  Units
> >> > VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
> >> > VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
> >> >
> >> > This is a significant improvement, about 2.84x faster compared to when
> >> > bound checking is enabled.
> >> > However, in our scenario, we would still chose to bypass Arrow APIs
> >> > without hesitation, because such memory accesses are so frequent
> >> > operations, that a 30% performance degradation will easily cause us lose
> >> > edge.
> >> >
> >> > The results can be attributed to the following factors:
> >> > 1. Although the checks have been disabled, we still need to read the
> >> flag
> >> > and check it repeatedly in the Arrow APIs, which accumulates to large
> >> > performance overhead.
> >> > 2. There is too much code in the call stacks, compared with the unsafe
> >> > API. This will lead to less efficient i-cache, even if JIT can avoids
> >> the
> >> > cost of stack frames by in-lining most method code.
> >> >
> >> > Another, maybe separate problem is that, the
> >> > flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in
> >> a
> >> > static block. That means the only reliable way to override it is to
> >> > override system properties in the JVM command line. However, for some
> >> > scenarios, we do not have access to the command line (e.g. running
> >> Flink in
> >> > Yarn). I think this deserves a separate issue.
> >> >
> >> > Best,
> >> > Liya Fan
> >> >
> >> > On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >> >
> >> >> >
> >> >> > Maybe I need to take a closer look at how the other SQL engines are
> >> >> using
> >> >> > Arrow. To see if they are also bypassing Arrow APIs.
> >> >> > I agree that a random user should be able to protect themselves, and
> >> >> this
> >> >> > is the utmost priority.
> >> >> >
> >> >> > According to my experience in Flink, JIT cannot optimize away the
> >> >> checks,
> >> >> > and removing the checks addresses the issue.
> >> >> > I want to illustrate this from two points:
> >> >> >
> >> >> > 1. Theoretical view point: JIT makes optimizations without changing
> >> >> > semantics of the code, so it can never remove the checks without
> >> >> changing
> >> >> > code semantics. To make it simple, if the JIT has witness the engine
> >> >> > successfully processed 1,000,000 records, how can it be sure that the
> >> >> > 1,000,001th record will be successful?
> >> >> >
> >> >> > 2. Practical view point: we have evaluated our SQL engine on TPC-H
> >> 1TB
> >> >> data
> >> >> > set. This is really a large number of records. So the JIT must have
> >> done
> >> >> > all it could to improve the code. According to the performance
> >> results,
> >> >> > however, it could not eliminate the impact caused checks.
> >> >> >
> >> >>
> >> >> I don't think you're following my point. There are two different
> >> points it
> >> >> seems like you want to discuss. Let's evaluate each separately:
> >> >>
> >> >> 1) Bounds checking for safety
> >> >> 2) Supposed inefficiency of the call hierarchy.
> >> >>
> >> >> For #1 we provide a system level property that can disable these. The
> >> JVM
> >> >> should succesfully optimize away this operation if that flag is set.
> >> >> Please
> >> >> look at the JIT output to confirm whether this is true.
> >> >>
> >> >> For #2: We designed things to collapse so the call hierarchy shouldn't
> >> be
> >> >> a
> >> >> problem. Please look at the JIT output to confirm.
> >> >>
> >> >> Please come with data around #1 and #2 to make an argument for a set of
> >> >> changes.
> >> >>
> >> >> thanks
> >> >>
> >> >
> >>
> >

回复：[DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by "niki.lj" <ni...@aliyun.com.INVALID>.

+1 on this proposal.


------------------------------------------------------------------
发件人：Fan Liya <li...@gmail.com>
发送时间：2019年5月9日(星期四) 16:33
收件人：dev <de...@arrow.apache.org>
主　题：Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Hi all,

Our previous results on micro-benchmarks show that, the original Arrow API
is 30% slower than the unsafe API.
After profiling, we found that, the performance overhead comes from the
null-checking in the get method. For example, the get method of
Float8Vector looks like this:

  public double get(int index) throws IllegalStateException {
    if (isSet(index) == 0) {
      throw new IllegalStateException("Value at index is null");
    }
    return valueBuffer.getDouble(index * TYPE_WIDTH);
  }

It first makes sure the value is not null, and then retrieve the value.

In some cases, the first check is redundant, because the application code
usually do the check before calling the get method. For such cases, the
first check can be skipped.
Therefore, @Jacques Nadeau suggests adding another flag to enable/disable
such check. I think this is a good suggestion, because it solves the
performance problem, without introducing a new set of vector classes. What
do you think?

I have opened a JIRA for that (ARROW-5290
<https://issues.apache.org/jira/browse/ARROW-5290>). Please give your
valuable comments.
Thanks a lot for your attention and valuable comments.
Special thanks to @Jacques Nadeau for all the suggestions and helpful
comments.

Best,
Liya Fan




On Wed, May 8, 2019 at 1:05 PM Fan Liya <li...@gmail.com> wrote:

> Hi Jacques,
>
> Thanks a lot for your comments.
>
> I have evaluated the assembly code of original Arrow API, as well as the
> unsafe API in our PR <https://github.com/apache/arrow/pull/4212>
> Generally, the assembly code generated by JIT for both APIs are of high
> quality, and for most cases, the assembly code are almost the same.
>
> However, some checks can be further removed. The following figures give an
> example (the figures are too big to be attached, so I have attached them in
> a JIRA comment. Please see comment
> <https://issues.apache.org/jira/browse/ARROW-5200?focusedCommentId=16835303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16835303>. Sorry
> for the inconvenience):
>
> The first figure shows the code of original Arrow API, while the second
> shows the code for the unsafe API.
> It can be observed that for the unsafe API, the amounts of the source,
> byte and assembly code are all smaller. So it can be expected that the
> performance of unsafe API is better.
>
> Concerning this particular example for the Float8Vector, I think it is
> reasonable to further remove the check in the get method:
> Before we call the get method, we must check if the value is null, so the
> check in the get method is redundant. And this is a typical scenario of
> using Arrow API (check and then get), at least for our scenario (please
> take a glimpse of our benchmark in PR
> <https://github.com/apache/arrow/pull/4198>).
>
> Concerning the other problem, about the real algorithm in our scenario. I
> want to make two points:
>
> 1. SQL engines are performance critical, so 30% is a large number for us.
> For the past year, it took our team several months just to improve the
> performance of our runtime engine by around 15%.
>
> 2. The performance of engine heavily depends on the performance of Arrow.
> Most SQL engines are memory-intensive, so the performance of get/set
> methods is the key. To get a flavor of the algorithms in our engine, please
> refer to PR <https://github.com/apache/arrow/pull/4198>. That is the core
> algorithm of our operator, which is executed many times during the
> processing of a SQL query. You can find that the computation is relatively
> simple, and most method calls are memory accesses.
>
> Best,
> Liya Fan
>
> On Mon, May 6, 2019 at 5:52 PM Jacques Nadeau <ja...@apache.org> wrote:
>
>> I am still asking the same question: can you please analyze the assembly
>> the JIT is producing and look to identify why the disabled bounds checking
>> is at 30% and what types of things we can do to address. For example, we
>> have talked before about a bytecode transformer that simply removes the
>> bounds checking when loading Arrow if you want that behavior. If
>> necessary,
>> that may be a big win from a code maintenance standpoint over having
>> duplicate interfaces.
>>
>> The static block seems like a non-problem. You could simply load another
>> class that system property before loading any Arrow code. If you're
>> proposing a code change to solve your problem today, this seems just as
>> feasible.
>>
>> The other question: in a real algorithm, how much does that 30% matter?
>> Your benchmarks are entirely about this one call whereas real workloads
>> are
>> impacted by many things and the time in writing/reading vectors is
>> miniscule versus other things.
>>
>> On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com> wrote:
>>
>> > Hi Jacques,
>> >
>> > Thank you so much for your kind reminder.
>> >
>> > To come up with some performance data, I have set up an environment and
>> > run some micro-benchmarks.
>> > The server runs Linux, has 64 cores and has 256 GB memory.
>> > The benchmarks are simple iterations over some double vectors (the
>> source
>> > file is attached):
>> >
>> >   @Benchmark
>> >   @BenchmarkMode(Mode.AverageTime)
>> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>> >   public void testSafe() {
>> >     safeSum = 0;
>> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
>> >       safeVector.set(i, i + 10.0);
>> >       safeSum += safeVector.get(i);
>> >     }
>> >   }
>> >
>> >   @Benchmark
>> >   @BenchmarkMode(Mode.AverageTime)
>> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>> >   public void testUnSafe() {
>> >     unSafeSum = 0;
>> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
>> >       unsafeVector.set(i, i + 10.0);
>> >       unSafeSum += unsafeVector.get(i);
>> >     }
>> >   }
>> >
>> > The safe vector in the testSafe benchmark is from the original Arrow
>> > implementation, whereas the unsafe vector in the testUnsafe benchmark is
>> > based on our initial implementation in PR
>> > <https://github.com/apache/arrow/pull/4212> (This is not the final
>> > version. However, we believe much overhead has been removed).
>> > The evaluation is based on JMH framework (thanks to the suggestion from
>> > Jacques Nadeau). The benchmarks are run so many times by the framework
>> that
>> > the effects of JIT are well considered.
>> >
>> > In the first experiment, we use the default configuration (boundary
>> > checking enabled), and the original Arrow vector is about 4 times
>> slower:
>> >
>> > Benchmark                       Mode  Cnt   Score   Error  Units
>> > VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
>> > VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
>> >
>> > In the second experiment, we disable the boundary checking by JVM
>> options:
>> >
>> > -Ddrill.enable_unsafe_memory_access=true
>> > -Darrow.enable_unsafe_memory_access=true
>> >
>> > This time, the original Arrow vector is about 30% slower:
>> >
>> > Benchmark                       Mode  Cnt  Score   Error  Units
>> > VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
>> > VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
>> >
>> > This is a significant improvement, about 2.84x faster compared to when
>> > bound checking is enabled.
>> > However, in our scenario, we would still chose to bypass Arrow APIs
>> > without hesitation, because such memory accesses are so frequent
>> > operations, that a 30% performance degradation will easily cause us lose
>> > edge.
>> >
>> > The results can be attributed to the following factors:
>> > 1. Although the checks have been disabled, we still need to read the
>> flag
>> > and check it repeatedly in the Arrow APIs, which accumulates to large
>> > performance overhead.
>> > 2. There is too much code in the call stacks, compared with the unsafe
>> > API. This will lead to less efficient i-cache, even if JIT can avoids
>> the
>> > cost of stack frames by in-lining most method code.
>> >
>> > Another, maybe separate problem is that, the
>> > flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in
>> a
>> > static block. That means the only reliable way to override it is to
>> > override system properties in the JVM command line. However, for some
>> > scenarios, we do not have access to the command line (e.g. running
>> Flink in
>> > Yarn). I think this deserves a separate issue.
>> >
>> > Best,
>> > Liya Fan
>> >
>> > On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <ja...@apache.org>
>> wrote:
>> >
>> >> >
>> >> > Maybe I need to take a closer look at how the other SQL engines are
>> >> using
>> >> > Arrow. To see if they are also bypassing Arrow APIs.
>> >> > I agree that a random user should be able to protect themselves, and
>> >> this
>> >> > is the utmost priority.
>> >> >
>> >> > According to my experience in Flink, JIT cannot optimize away the
>> >> checks,
>> >> > and removing the checks addresses the issue.
>> >> > I want to illustrate this from two points:
>> >> >
>> >> > 1. Theoretical view point: JIT makes optimizations without changing
>> >> > semantics of the code, so it can never remove the checks without
>> >> changing
>> >> > code semantics. To make it simple, if the JIT has witness the engine
>> >> > successfully processed 1,000,000 records, how can it be sure that the
>> >> > 1,000,001th record will be successful?
>> >> >
>> >> > 2. Practical view point: we have evaluated our SQL engine on TPC-H
>> 1TB
>> >> data
>> >> > set. This is really a large number of records. So the JIT must have
>> done
>> >> > all it could to improve the code. According to the performance
>> results,
>> >> > however, it could not eliminate the impact caused checks.
>> >> >
>> >>
>> >> I don't think you're following my point. There are two different
>> points it
>> >> seems like you want to discuss. Let's evaluate each separately:
>> >>
>> >> 1) Bounds checking for safety
>> >> 2) Supposed inefficiency of the call hierarchy.
>> >>
>> >> For #1 we provide a system level property that can disable these. The
>> JVM
>> >> should succesfully optimize away this operation if that flag is set.
>> >> Please
>> >> look at the JIT output to confirm whether this is true.
>> >>
>> >> For #2: We designed things to collapse so the call hierarchy shouldn't
>> be
>> >> a
>> >> problem. Please look at the JIT output to confirm.
>> >>
>> >> Please come with data around #1 and #2 to make an argument for a set of
>> >> changes.
>> >>
>> >> thanks
>> >>
>> >
>>
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi all,

Our previous results on micro-benchmarks show that, the original Arrow API
is 30% slower than the unsafe API.
After profiling, we found that, the performance overhead comes from the
null-checking in the get method. For example, the get method of
Float8Vector looks like this:

  public double get(int index) throws IllegalStateException {
    if (isSet(index) == 0) {
      throw new IllegalStateException("Value at index is null");
    }
    return valueBuffer.getDouble(index * TYPE_WIDTH);
  }

It first makes sure the value is not null, and then retrieve the value.

In some cases, the first check is redundant, because the application code
usually do the check before calling the get method. For such cases, the
first check can be skipped.
Therefore, @Jacques Nadeau suggests adding another flag to enable/disable
such check. I think this is a good suggestion, because it solves the
performance problem, without introducing a new set of vector classes. What
do you think?

I have opened a JIRA for that (ARROW-5290
<https://issues.apache.org/jira/browse/ARROW-5290>). Please give your
valuable comments.
Thanks a lot for your attention and valuable comments.
Special thanks to @Jacques Nadeau for all the suggestions and helpful
comments.

Best,
Liya Fan




On Wed, May 8, 2019 at 1:05 PM Fan Liya <li...@gmail.com> wrote:

> Hi Jacques,
>
> Thanks a lot for your comments.
>
> I have evaluated the assembly code of original Arrow API, as well as the
> unsafe API in our PR <https://github.com/apache/arrow/pull/4212>
> Generally, the assembly code generated by JIT for both APIs are of high
> quality, and for most cases, the assembly code are almost the same.
>
> However, some checks can be further removed. The following figures give an
> example (the figures are too big to be attached, so I have attached them in
> a JIRA comment. Please see comment
> <https://issues.apache.org/jira/browse/ARROW-5200?focusedCommentId=16835303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16835303>. Sorry
> for the inconvenience):
>
> The first figure shows the code of original Arrow API, while the second
> shows the code for the unsafe API.
> It can be observed that for the unsafe API, the amounts of the source,
> byte and assembly code are all smaller. So it can be expected that the
> performance of unsafe API is better.
>
> Concerning this particular example for the Float8Vector, I think it is
> reasonable to further remove the check in the get method:
> Before we call the get method, we must check if the value is null, so the
> check in the get method is redundant. And this is a typical scenario of
> using Arrow API (check and then get), at least for our scenario (please
> take a glimpse of our benchmark in PR
> <https://github.com/apache/arrow/pull/4198>).
>
> Concerning the other problem, about the real algorithm in our scenario. I
> want to make two points:
>
> 1. SQL engines are performance critical, so 30% is a large number for us.
> For the past year, it took our team several months just to improve the
> performance of our runtime engine by around 15%.
>
> 2. The performance of engine heavily depends on the performance of Arrow.
> Most SQL engines are memory-intensive, so the performance of get/set
> methods is the key. To get a flavor of the algorithms in our engine, please
> refer to PR <https://github.com/apache/arrow/pull/4198>. That is the core
> algorithm of our operator, which is executed many times during the
> processing of a SQL query. You can find that the computation is relatively
> simple, and most method calls are memory accesses.
>
> Best,
> Liya Fan
>
> On Mon, May 6, 2019 at 5:52 PM Jacques Nadeau <ja...@apache.org> wrote:
>
>> I am still asking the same question: can you please analyze the assembly
>> the JIT is producing and look to identify why the disabled bounds checking
>> is at 30% and what types of things we can do to address. For example, we
>> have talked before about a bytecode transformer that simply removes the
>> bounds checking when loading Arrow if you want that behavior. If
>> necessary,
>> that may be a big win from a code maintenance standpoint over having
>> duplicate interfaces.
>>
>> The static block seems like a non-problem. You could simply load another
>> class that system property before loading any Arrow code. If you're
>> proposing a code change to solve your problem today, this seems just as
>> feasible.
>>
>> The other question: in a real algorithm, how much does that 30% matter?
>> Your benchmarks are entirely about this one call whereas real workloads
>> are
>> impacted by many things and the time in writing/reading vectors is
>> miniscule versus other things.
>>
>> On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com> wrote:
>>
>> > Hi Jacques,
>> >
>> > Thank you so much for your kind reminder.
>> >
>> > To come up with some performance data, I have set up an environment and
>> > run some micro-benchmarks.
>> > The server runs Linux, has 64 cores and has 256 GB memory.
>> > The benchmarks are simple iterations over some double vectors (the
>> source
>> > file is attached):
>> >
>> >   @Benchmark
>> >   @BenchmarkMode(Mode.AverageTime)
>> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>> >   public void testSafe() {
>> >     safeSum = 0;
>> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
>> >       safeVector.set(i, i + 10.0);
>> >       safeSum += safeVector.get(i);
>> >     }
>> >   }
>> >
>> >   @Benchmark
>> >   @BenchmarkMode(Mode.AverageTime)
>> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>> >   public void testUnSafe() {
>> >     unSafeSum = 0;
>> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
>> >       unsafeVector.set(i, i + 10.0);
>> >       unSafeSum += unsafeVector.get(i);
>> >     }
>> >   }
>> >
>> > The safe vector in the testSafe benchmark is from the original Arrow
>> > implementation, whereas the unsafe vector in the testUnsafe benchmark is
>> > based on our initial implementation in PR
>> > <https://github.com/apache/arrow/pull/4212> (This is not the final
>> > version. However, we believe much overhead has been removed).
>> > The evaluation is based on JMH framework (thanks to the suggestion from
>> > Jacques Nadeau). The benchmarks are run so many times by the framework
>> that
>> > the effects of JIT are well considered.
>> >
>> > In the first experiment, we use the default configuration (boundary
>> > checking enabled), and the original Arrow vector is about 4 times
>> slower:
>> >
>> > Benchmark                       Mode  Cnt   Score   Error  Units
>> > VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
>> > VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
>> >
>> > In the second experiment, we disable the boundary checking by JVM
>> options:
>> >
>> > -Ddrill.enable_unsafe_memory_access=true
>> > -Darrow.enable_unsafe_memory_access=true
>> >
>> > This time, the original Arrow vector is about 30% slower:
>> >
>> > Benchmark                       Mode  Cnt  Score   Error  Units
>> > VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
>> > VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
>> >
>> > This is a significant improvement, about 2.84x faster compared to when
>> > bound checking is enabled.
>> > However, in our scenario, we would still chose to bypass Arrow APIs
>> > without hesitation, because such memory accesses are so frequent
>> > operations, that a 30% performance degradation will easily cause us lose
>> > edge.
>> >
>> > The results can be attributed to the following factors:
>> > 1. Although the checks have been disabled, we still need to read the
>> flag
>> > and check it repeatedly in the Arrow APIs, which accumulates to large
>> > performance overhead.
>> > 2. There is too much code in the call stacks, compared with the unsafe
>> > API. This will lead to less efficient i-cache, even if JIT can avoids
>> the
>> > cost of stack frames by in-lining most method code.
>> >
>> > Another, maybe separate problem is that, the
>> > flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in
>> a
>> > static block. That means the only reliable way to override it is to
>> > override system properties in the JVM command line. However, for some
>> > scenarios, we do not have access to the command line (e.g. running
>> Flink in
>> > Yarn). I think this deserves a separate issue.
>> >
>> > Best,
>> > Liya Fan
>> >
>> > On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <ja...@apache.org>
>> wrote:
>> >
>> >> >
>> >> > Maybe I need to take a closer look at how the other SQL engines are
>> >> using
>> >> > Arrow. To see if they are also bypassing Arrow APIs.
>> >> > I agree that a random user should be able to protect themselves, and
>> >> this
>> >> > is the utmost priority.
>> >> >
>> >> > According to my experience in Flink, JIT cannot optimize away the
>> >> checks,
>> >> > and removing the checks addresses the issue.
>> >> > I want to illustrate this from two points:
>> >> >
>> >> > 1. Theoretical view point: JIT makes optimizations without changing
>> >> > semantics of the code, so it can never remove the checks without
>> >> changing
>> >> > code semantics. To make it simple, if the JIT has witness the engine
>> >> > successfully processed 1,000,000 records, how can it be sure that the
>> >> > 1,000,001th record will be successful?
>> >> >
>> >> > 2. Practical view point: we have evaluated our SQL engine on TPC-H
>> 1TB
>> >> data
>> >> > set. This is really a large number of records. So the JIT must have
>> done
>> >> > all it could to improve the code. According to the performance
>> results,
>> >> > however, it could not eliminate the impact caused checks.
>> >> >
>> >>
>> >> I don't think you're following my point. There are two different
>> points it
>> >> seems like you want to discuss. Let's evaluate each separately:
>> >>
>> >> 1) Bounds checking for safety
>> >> 2) Supposed inefficiency of the call hierarchy.
>> >>
>> >> For #1 we provide a system level property that can disable these. The
>> JVM
>> >> should succesfully optimize away this operation if that flag is set.
>> >> Please
>> >> look at the JIT output to confirm whether this is true.
>> >>
>> >> For #2: We designed things to collapse so the call hierarchy shouldn't
>> be
>> >> a
>> >> problem. Please look at the JIT output to confirm.
>> >>
>> >> Please come with data around #1 and #2 to make an argument for a set of
>> >> changes.
>> >>
>> >> thanks
>> >>
>> >
>>
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi Jacques,

Thanks a lot for your comments.

I have evaluated the assembly code of original Arrow API, as well as the
unsafe API in our PR <https://github.com/apache/arrow/pull/4212>
Generally, the assembly code generated by JIT for both APIs are of high
quality, and for most cases, the assembly code are almost the same.

However, some checks can be further removed. The following figures give an
example (the figures are too big to be attached, so I have attached them in
a JIRA comment. Please see comment
<https://issues.apache.org/jira/browse/ARROW-5200?focusedCommentId=16835303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16835303>.
Sorry
for the inconvenience):

The first figure shows the code of original Arrow API, while the second
shows the code for the unsafe API.
It can be observed that for the unsafe API, the amounts of the source, byte
and assembly code are all smaller. So it can be expected that the
performance of unsafe API is better.

Concerning this particular example for the Float8Vector, I think it is
reasonable to further remove the check in the get method:
Before we call the get method, we must check if the value is null, so the
check in the get method is redundant. And this is a typical scenario of
using Arrow API (check and then get), at least for our scenario (please
take a glimpse of our benchmark in PR
<https://github.com/apache/arrow/pull/4198>).

Concerning the other problem, about the real algorithm in our scenario. I
want to make two points:

1. SQL engines are performance critical, so 30% is a large number for us.
For the past year, it took our team several months just to improve the
performance of our runtime engine by around 15%.

2. The performance of engine heavily depends on the performance of Arrow.
Most SQL engines are memory-intensive, so the performance of get/set
methods is the key. To get a flavor of the algorithms in our engine, please
refer to PR <https://github.com/apache/arrow/pull/4198>. That is the core
algorithm of our operator, which is executed many times during the
processing of a SQL query. You can find that the computation is relatively
simple, and most method calls are memory accesses.

Best,
Liya Fan

On Mon, May 6, 2019 at 5:52 PM Jacques Nadeau <ja...@apache.org> wrote:

> I am still asking the same question: can you please analyze the assembly
> the JIT is producing and look to identify why the disabled bounds checking
> is at 30% and what types of things we can do to address. For example, we
> have talked before about a bytecode transformer that simply removes the
> bounds checking when loading Arrow if you want that behavior. If necessary,
> that may be a big win from a code maintenance standpoint over having
> duplicate interfaces.
>
> The static block seems like a non-problem. You could simply load another
> class that system property before loading any Arrow code. If you're
> proposing a code change to solve your problem today, this seems just as
> feasible.
>
> The other question: in a real algorithm, how much does that 30% matter?
> Your benchmarks are entirely about this one call whereas real workloads are
> impacted by many things and the time in writing/reading vectors is
> miniscule versus other things.
>
> On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com> wrote:
>
> > Hi Jacques,
> >
> > Thank you so much for your kind reminder.
> >
> > To come up with some performance data, I have set up an environment and
> > run some micro-benchmarks.
> > The server runs Linux, has 64 cores and has 256 GB memory.
> > The benchmarks are simple iterations over some double vectors (the source
> > file is attached):
> >
> >   @Benchmark
> >   @BenchmarkMode(Mode.AverageTime)
> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> >   public void testSafe() {
> >     safeSum = 0;
> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> >       safeVector.set(i, i + 10.0);
> >       safeSum += safeVector.get(i);
> >     }
> >   }
> >
> >   @Benchmark
> >   @BenchmarkMode(Mode.AverageTime)
> >   @OutputTimeUnit(TimeUnit.MICROSECONDS)
> >   public void testUnSafe() {
> >     unSafeSum = 0;
> >     for (int i = 0; i < VECTOR_LENGTH; i++) {
> >       unsafeVector.set(i, i + 10.0);
> >       unSafeSum += unsafeVector.get(i);
> >     }
> >   }
> >
> > The safe vector in the testSafe benchmark is from the original Arrow
> > implementation, whereas the unsafe vector in the testUnsafe benchmark is
> > based on our initial implementation in PR
> > <https://github.com/apache/arrow/pull/4212> (This is not the final
> > version. However, we believe much overhead has been removed).
> > The evaluation is based on JMH framework (thanks to the suggestion from
> > Jacques Nadeau). The benchmarks are run so many times by the framework
> that
> > the effects of JIT are well considered.
> >
> > In the first experiment, we use the default configuration (boundary
> > checking enabled), and the original Arrow vector is about 4 times slower:
> >
> > Benchmark                       Mode  Cnt   Score   Error  Units
> > VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
> > VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
> >
> > In the second experiment, we disable the boundary checking by JVM
> options:
> >
> > -Ddrill.enable_unsafe_memory_access=true
> > -Darrow.enable_unsafe_memory_access=true
> >
> > This time, the original Arrow vector is about 30% slower:
> >
> > Benchmark                       Mode  Cnt  Score   Error  Units
> > VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
> > VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
> >
> > This is a significant improvement, about 2.84x faster compared to when
> > bound checking is enabled.
> > However, in our scenario, we would still chose to bypass Arrow APIs
> > without hesitation, because such memory accesses are so frequent
> > operations, that a 30% performance degradation will easily cause us lose
> > edge.
> >
> > The results can be attributed to the following factors:
> > 1. Although the checks have been disabled, we still need to read the flag
> > and check it repeatedly in the Arrow APIs, which accumulates to large
> > performance overhead.
> > 2. There is too much code in the call stacks, compared with the unsafe
> > API. This will lead to less efficient i-cache, even if JIT can avoids the
> > cost of stack frames by in-lining most method code.
> >
> > Another, maybe separate problem is that, the
> > flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in a
> > static block. That means the only reliable way to override it is to
> > override system properties in the JVM command line. However, for some
> > scenarios, we do not have access to the command line (e.g. running Flink
> in
> > Yarn). I think this deserves a separate issue.
> >
> > Best,
> > Liya Fan
> >
> > On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> >> >
> >> > Maybe I need to take a closer look at how the other SQL engines are
> >> using
> >> > Arrow. To see if they are also bypassing Arrow APIs.
> >> > I agree that a random user should be able to protect themselves, and
> >> this
> >> > is the utmost priority.
> >> >
> >> > According to my experience in Flink, JIT cannot optimize away the
> >> checks,
> >> > and removing the checks addresses the issue.
> >> > I want to illustrate this from two points:
> >> >
> >> > 1. Theoretical view point: JIT makes optimizations without changing
> >> > semantics of the code, so it can never remove the checks without
> >> changing
> >> > code semantics. To make it simple, if the JIT has witness the engine
> >> > successfully processed 1,000,000 records, how can it be sure that the
> >> > 1,000,001th record will be successful?
> >> >
> >> > 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB
> >> data
> >> > set. This is really a large number of records. So the JIT must have
> done
> >> > all it could to improve the code. According to the performance
> results,
> >> > however, it could not eliminate the impact caused checks.
> >> >
> >>
> >> I don't think you're following my point. There are two different points
> it
> >> seems like you want to discuss. Let's evaluate each separately:
> >>
> >> 1) Bounds checking for safety
> >> 2) Supposed inefficiency of the call hierarchy.
> >>
> >> For #1 we provide a system level property that can disable these. The
> JVM
> >> should succesfully optimize away this operation if that flag is set.
> >> Please
> >> look at the JIT output to confirm whether this is true.
> >>
> >> For #2: We designed things to collapse so the call hierarchy shouldn't
> be
> >> a
> >> problem. Please look at the JIT output to confirm.
> >>
> >> Please come with data around #1 and #2 to make an argument for a set of
> >> changes.
> >>
> >> thanks
> >>
> >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Jacques Nadeau <ja...@apache.org>.

I am still asking the same question: can you please analyze the assembly
the JIT is producing and look to identify why the disabled bounds checking
is at 30% and what types of things we can do to address. For example, we
have talked before about a bytecode transformer that simply removes the
bounds checking when loading Arrow if you want that behavior. If necessary,
that may be a big win from a code maintenance standpoint over having
duplicate interfaces.

The static block seems like a non-problem. You could simply load another
class that system property before loading any Arrow code. If you're
proposing a code change to solve your problem today, this seems just as
feasible.

The other question: in a real algorithm, how much does that 30% matter?
Your benchmarks are entirely about this one call whereas real workloads are
impacted by many things and the time in writing/reading vectors is
miniscule versus other things.

On Mon, May 6, 2019 at 1:16 PM Fan Liya <li...@gmail.com> wrote:

> Hi Jacques,
>
> Thank you so much for your kind reminder.
>
> To come up with some performance data, I have set up an environment and
> run some micro-benchmarks.
> The server runs Linux, has 64 cores and has 256 GB memory.
> The benchmarks are simple iterations over some double vectors (the source
> file is attached):
>
>   @Benchmark
>   @BenchmarkMode(Mode.AverageTime)
>   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>   public void testSafe() {
>     safeSum = 0;
>     for (int i = 0; i < VECTOR_LENGTH; i++) {
>       safeVector.set(i, i + 10.0);
>       safeSum += safeVector.get(i);
>     }
>   }
>
>   @Benchmark
>   @BenchmarkMode(Mode.AverageTime)
>   @OutputTimeUnit(TimeUnit.MICROSECONDS)
>   public void testUnSafe() {
>     unSafeSum = 0;
>     for (int i = 0; i < VECTOR_LENGTH; i++) {
>       unsafeVector.set(i, i + 10.0);
>       unSafeSum += unsafeVector.get(i);
>     }
>   }
>
> The safe vector in the testSafe benchmark is from the original Arrow
> implementation, whereas the unsafe vector in the testUnsafe benchmark is
> based on our initial implementation in PR
> <https://github.com/apache/arrow/pull/4212> (This is not the final
> version. However, we believe much overhead has been removed).
> The evaluation is based on JMH framework (thanks to the suggestion from
> Jacques Nadeau). The benchmarks are run so many times by the framework that
> the effects of JIT are well considered.
>
> In the first experiment, we use the default configuration (boundary
> checking enabled), and the original Arrow vector is about 4 times slower:
>
> Benchmark                       Mode  Cnt   Score   Error  Units
> VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
> VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op
>
> In the second experiment, we disable the boundary checking by JVM options:
>
> -Ddrill.enable_unsafe_memory_access=true
> -Darrow.enable_unsafe_memory_access=true
>
> This time, the original Arrow vector is about 30% slower:
>
> Benchmark                       Mode  Cnt  Score   Error  Units
> VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
> VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op
>
> This is a significant improvement, about 2.84x faster compared to when
> bound checking is enabled.
> However, in our scenario, we would still chose to bypass Arrow APIs
> without hesitation, because such memory accesses are so frequent
> operations, that a 30% performance degradation will easily cause us lose
> edge.
>
> The results can be attributed to the following factors:
> 1. Although the checks have been disabled, we still need to read the flag
> and check it repeatedly in the Arrow APIs, which accumulates to large
> performance overhead.
> 2. There is too much code in the call stacks, compared with the unsafe
> API. This will lead to less efficient i-cache, even if JIT can avoids the
> cost of stack frames by in-lining most method code.
>
> Another, maybe separate problem is that, the
> flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in a
> static block. That means the only reliable way to override it is to
> override system properties in the JVM command line. However, for some
> scenarios, we do not have access to the command line (e.g. running Flink in
> Yarn). I think this deserves a separate issue.
>
> Best,
> Liya Fan
>
> On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <ja...@apache.org> wrote:
>
>> >
>> > Maybe I need to take a closer look at how the other SQL engines are
>> using
>> > Arrow. To see if they are also bypassing Arrow APIs.
>> > I agree that a random user should be able to protect themselves, and
>> this
>> > is the utmost priority.
>> >
>> > According to my experience in Flink, JIT cannot optimize away the
>> checks,
>> > and removing the checks addresses the issue.
>> > I want to illustrate this from two points:
>> >
>> > 1. Theoretical view point: JIT makes optimizations without changing
>> > semantics of the code, so it can never remove the checks without
>> changing
>> > code semantics. To make it simple, if the JIT has witness the engine
>> > successfully processed 1,000,000 records, how can it be sure that the
>> > 1,000,001th record will be successful?
>> >
>> > 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB
>> data
>> > set. This is really a large number of records. So the JIT must have done
>> > all it could to improve the code. According to the performance results,
>> > however, it could not eliminate the impact caused checks.
>> >
>>
>> I don't think you're following my point. There are two different points it
>> seems like you want to discuss. Let's evaluate each separately:
>>
>> 1) Bounds checking for safety
>> 2) Supposed inefficiency of the call hierarchy.
>>
>> For #1 we provide a system level property that can disable these. The JVM
>> should succesfully optimize away this operation if that flag is set.
>> Please
>> look at the JIT output to confirm whether this is true.
>>
>> For #2: We designed things to collapse so the call hierarchy shouldn't be
>> a
>> problem. Please look at the JIT output to confirm.
>>
>> Please come with data around #1 and #2 to make an argument for a set of
>> changes.
>>
>> thanks
>>
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi Jacques,

Thank you so much for your kind reminder.

To come up with some performance data, I have set up an environment and run
some micro-benchmarks.
The server runs Linux, has 64 cores and has 256 GB memory.
The benchmarks are simple iterations over some double vectors (the source
file is attached):

  @Benchmark
  @BenchmarkMode(Mode.AverageTime)
  @OutputTimeUnit(TimeUnit.MICROSECONDS)
  public void testSafe() {
    safeSum = 0;
    for (int i = 0; i < VECTOR_LENGTH; i++) {
      safeVector.set(i, i + 10.0);
      safeSum += safeVector.get(i);
    }
  }

  @Benchmark
  @BenchmarkMode(Mode.AverageTime)
  @OutputTimeUnit(TimeUnit.MICROSECONDS)
  public void testUnSafe() {
    unSafeSum = 0;
    for (int i = 0; i < VECTOR_LENGTH; i++) {
      unsafeVector.set(i, i + 10.0);
      unSafeSum += unsafeVector.get(i);
    }
  }

The safe vector in the testSafe benchmark is from the original Arrow
implementation, whereas the unsafe vector in the testUnsafe benchmark is
based on our initial implementation in PR
<https://github.com/apache/arrow/pull/4212> (This is not the final version.
However, we believe much overhead has been removed).
The evaluation is based on JMH framework (thanks to the suggestion from
Jacques Nadeau). The benchmarks are run so many times by the framework that
the effects of JIT are well considered.

In the first experiment, we use the default configuration (boundary
checking enabled), and the original Arrow vector is about 4 times slower:

Benchmark                       Mode  Cnt   Score   Error  Units
VectorAPIBenchmarks.testSafe    avgt    5  11.546 ± 0.012  us/op
VectorAPIBenchmarks.testUnSafe  avgt    5   2.822 ± 0.006  us/op

In the second experiment, we disable the boundary checking by JVM options:

-Ddrill.enable_unsafe_memory_access=true
-Darrow.enable_unsafe_memory_access=true

This time, the original Arrow vector is about 30% slower:

Benchmark                       Mode  Cnt  Score   Error  Units
VectorAPIBenchmarks.testSafe    avgt    5  4.069 ± 0.004  us/op
VectorAPIBenchmarks.testUnSafe  avgt    5  2.819 ± 0.005  us/op

This is a significant improvement, about 2.84x faster compared to when
bound checking is enabled.
However, in our scenario, we would still chose to bypass Arrow APIs without
hesitation, because such memory accesses are so frequent operations, that a
30% performance degradation will easily cause us lose edge.

The results can be attributed to the following factors:
1. Although the checks have been disabled, we still need to read the flag
and check it repeatedly in the Arrow APIs, which accumulates to large
performance overhead.
2. There is too much code in the call stacks, compared with the unsafe API.
This will lead to less efficient i-cache, even if JIT can avoids the cost
of stack frames by in-lining most method code.

Another, maybe separate problem is that, the
flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in a
static block. That means the only reliable way to override it is to
override system properties in the JVM command line. However, for some
scenarios, we do not have access to the command line (e.g. running Flink in
Yarn). I think this deserves a separate issue.

Best,
Liya Fan

On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <ja...@apache.org> wrote:

> >
> > Maybe I need to take a closer look at how the other SQL engines are using
> > Arrow. To see if they are also bypassing Arrow APIs.
> > I agree that a random user should be able to protect themselves, and this
> > is the utmost priority.
> >
> > According to my experience in Flink, JIT cannot optimize away the checks,
> > and removing the checks addresses the issue.
> > I want to illustrate this from two points:
> >
> > 1. Theoretical view point: JIT makes optimizations without changing
> > semantics of the code, so it can never remove the checks without changing
> > code semantics. To make it simple, if the JIT has witness the engine
> > successfully processed 1,000,000 records, how can it be sure that the
> > 1,000,001th record will be successful?
> >
> > 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB
> data
> > set. This is really a large number of records. So the JIT must have done
> > all it could to improve the code. According to the performance results,
> > however, it could not eliminate the impact caused checks.
> >
>
> I don't think you're following my point. There are two different points it
> seems like you want to discuss. Let's evaluate each separately:
>
> 1) Bounds checking for safety
> 2) Supposed inefficiency of the call hierarchy.
>
> For #1 we provide a system level property that can disable these. The JVM
> should succesfully optimize away this operation if that flag is set. Please
> look at the JIT output to confirm whether this is true.
>
> For #2: We designed things to collapse so the call hierarchy shouldn't be a
> problem. Please look at the JIT output to confirm.
>
> Please come with data around #1 and #2 to make an argument for a set of
> changes.
>
> thanks
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Jacques Nadeau <ja...@apache.org>.

>
> Maybe I need to take a closer look at how the other SQL engines are using
> Arrow. To see if they are also bypassing Arrow APIs.
> I agree that a random user should be able to protect themselves, and this
> is the utmost priority.
>
> According to my experience in Flink, JIT cannot optimize away the checks,
> and removing the checks addresses the issue.
> I want to illustrate this from two points:
>
> 1. Theoretical view point: JIT makes optimizations without changing
> semantics of the code, so it can never remove the checks without changing
> code semantics. To make it simple, if the JIT has witness the engine
> successfully processed 1,000,000 records, how can it be sure that the
> 1,000,001th record will be successful?
>
> 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB data
> set. This is really a large number of records. So the JIT must have done
> all it could to improve the code. According to the performance results,
> however, it could not eliminate the impact caused checks.
>

I don't think you're following my point. There are two different points it
seems like you want to discuss. Let's evaluate each separately:

1) Bounds checking for safety
2) Supposed inefficiency of the call hierarchy.

For #1 we provide a system level property that can disable these. The JVM
should succesfully optimize away this operation if that flag is set. Please
look at the JIT output to confirm whether this is true.

For #2: We designed things to collapse so the call hierarchy shouldn't be a
problem. Please look at the JIT output to confirm.

Please come with data around #1 and #2 to make an argument for a set of
changes.

thanks

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi Jacques,

Thanks a lot for your kind reply.
Please see my comments in line.

Best,
Liya Fan

>
>
> 1. How much slower is the current Arrow API, compared to directly
accessing
> off-heap memory?
>
> According to my (intuitive) experience in vectorizing Flink, the current
> API is much slower, at least one or two orders of magnitude slower.
> I am sorry I do not have the exact number. However, the conclusion can be
> expected to hold true: Parth's experience on Drill also confirms the
> conclusion.
> In fact, we are working on it. ARROW-5209 is about introducing performance
> benchmarks and once that is done, the number will be clear.
>

Are you comparing to a situation where you can crash the JVM versus one
where you cannot? Let's make sure we're comparing apples to apples.

-------

I agree with you that it does not make much sense to compare safe/unsafe
APIs directly.
There is no doubt that the safe API is slow but avoids JVM crashes, whereas
the unsafe API is fast but may cause JVM crashes.

Our goal is not to compare apples with apples. Our goal is to make the best
of both worlds.
Let me illustrate how to achieve this in the scenario of a SQL engine:

1. We develop our engine through the safe API. It is OK even if the
performance is not good. Meantime, we will find many bugs in the code.
2. Once all bugs have been fixed, we switch to the unsafe API through a
single flag, and deliver our product.  We have the confidence that little
or no JVM crash will happen.
3. If we actually encounter a JVM crash (this should happen rarely), we
switch back to the safe API, find and fix the bug, and switch back to the
unsafe API.

-------

>
> 2. Why is current Arrow APIs so slow?
>
> I think the main reason is too many function calls. I believe each
function
> call is highly optimized and only carries out simple work. However, the
> number of calls is large.
> The example in our online doc gives a simple example: a single call to
> Float8Vector.get method (which is an API fundamental enough) involves
> nearly 30 method calls. That is just too much overhead, especially for
> performance-critical scenarios, like SQL engines.
>

Are they? Who is asking that? I haven't heard that feedback at all and we
use the Arrow APIs extensively in Dremio and compete very well with other
SQL engines. The APIs were designed with the perspective that they need to
protect themselves in the context of the JVM so that a random user doesn't
hurt themselves. It sounds like maybe you don't agree with that.

It would be good for you to outline the 30 methods you see as being called
in FloatVector.get method.

In general, I think we should be more focused on the compiled code once it
has been optimized, not the methods. Have you looked at the assembly for
this method that the JIT outputs? The get method should collapse to a very
small number of instructions. If it isn't, we should address that. Have you
done that analysis? Has disabling the bounds checking addressed the issue
for you?

-------

Maybe I need to take a closer look at how the other SQL engines are using
Arrow. To see if they are also bypassing Arrow APIs.
I agree that a random user should be able to protect themselves, and this
is the utmost priority.

According to my experience in Flink, JIT cannot optimize away the checks,
and removing the checks addresses the issue.
I want to illustrate this from two points:

1. Theoretical view point: JIT makes optimizations without changing
semantics of the code, so it can never remove the checks without changing
code semantics. To make it simple, if the JIT has witness the engine
successfully processed 1,000,000 records, how can it be sure that the
1,000,001th record will be successful?

2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB data
set. This is really a large number of records. So the JIT must have done
all it could to improve the code. According to the performance results,
however, it could not eliminate the impact caused checks.

-------

> 3. Can we live without Arrow, and just directly access the off-heap memory
> (e.g. by the UNSAFE instance)?
>
> I guess the answer is absolutely, yes.
> Parth is doing this (bypassing Arrow API) with Drill, and this is exactly
> what we are doing with Flink. My point is that, providing light-weight
APIs
> will make it easier to use Arrow. Without such APIs, Parth may need to
> provide a library of Arrow wrappers in Drill, and we will need to provide
a
> library of Arrow wrappers in Flink, and so on. That's redundant work, and
> it may reduce the popularity of Arrow.

How are you going to come up with a set of APIs that protect the user or
unroll checks? Or you just arguing that the user should not be protected?

-------

Our users should be protected, and we should allow our users to protect
themselves, if they want to.
Formerly, we only give them option A. Now we give them option B.
It is up to the users to make their own choice, according to their specific
requirements.

I know the change seems abrupt. Just think about it. This is a real
requirement from real users.

On Sun, May 5, 2019 at 6:01 PM Jacques Nadeau <ja...@apache.org> wrote:

> >
> >
> > 1. How much slower is the current Arrow API, compared to directly
> accessing
> > off-heap memory?
> >
> > According to my (intuitive) experience in vectorizing Flink, the current
> > API is much slower, at least one or two orders of magnitude slower.
> > I am sorry I do not have the exact number. However, the conclusion can be
> > expected to hold true: Parth's experience on Drill also confirms the
> > conclusion.
> > In fact, we are working on it. ARROW-5209 is about introducing
> performance
> > benchmarks and once that is done, the number will be clear.
> >
>
> Are you comparing to a situation where you can crash the JVM versus one
> where you cannot? Let's make sure we're comparing apples to apples.
>
>
> >
> > 2. Why is current Arrow APIs so slow?
> >
> > I think the main reason is too many function calls. I believe each
> function
> > call is highly optimized and only carries out simple work. However, the
> > number of calls is large.
> > The example in our online doc gives a simple example: a single call to
> > Float8Vector.get method (which is an API fundamental enough) involves
> > nearly 30 method calls. That is just too much overhead, especially for
> > performance-critical scenarios, like SQL engines.
> >
>
> Are they? Who is asking that? I haven't heard that feedback at all and we
> use the Arrow APIs extensively in Dremio and compete very well with other
> SQL engines. The APIs were designed with the perspective that they need to
> protect themselves in the context of the JVM so that a random user doesn't
> hurt themselves. It sounds like maybe you don't agree with that.
>
> It would be good for you to outline the 30 methods you see as being called
> in FloatVector.get method.
>
> In general, I think we should be more focused on the compiled code once it
> has been optimized, not the methods. Have you looked at the assembly for
> this method that the JIT outputs? The get method should collapse to a very
> small number of instructions. If it isn't, we should address that. Have you
> done that analysis? Has disabling the bounds checking addressed the issue
> for you?
>
>
> > 3. Can we live without Arrow, and just directly access the off-heap
> memory
> > (e.g. by the UNSAFE instance)?
> >
> > I guess the answer is absolutely, yes.
> > Parth is doing this (bypassing Arrow API) with Drill, and this is exactly
> > what we are doing with Flink. My point is that, providing light-weight
> APIs
> > will make it easier to use Arrow. Without such APIs, Parth may need to
> > provide a library of Arrow wrappers in Drill, and we will need to
> provide a
> > library of Arrow wrappers in Flink, and so on. That's redundant work, and
> > it may reduce the popularity of Arrow.
>
>
> How are you going to come up with a set of APIs that protect the user or
> unroll checks? Or you just arguing that the user should not be protected?
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Jacques Nadeau <ja...@apache.org>.

>
>
> 1. How much slower is the current Arrow API, compared to directly accessing
> off-heap memory?
>
> According to my (intuitive) experience in vectorizing Flink, the current
> API is much slower, at least one or two orders of magnitude slower.
> I am sorry I do not have the exact number. However, the conclusion can be
> expected to hold true: Parth's experience on Drill also confirms the
> conclusion.
> In fact, we are working on it. ARROW-5209 is about introducing performance
> benchmarks and once that is done, the number will be clear.
>

Are you comparing to a situation where you can crash the JVM versus one
where you cannot? Let's make sure we're comparing apples to apples.


>
> 2. Why is current Arrow APIs so slow?
>
> I think the main reason is too many function calls. I believe each function
> call is highly optimized and only carries out simple work. However, the
> number of calls is large.
> The example in our online doc gives a simple example: a single call to
> Float8Vector.get method (which is an API fundamental enough) involves
> nearly 30 method calls. That is just too much overhead, especially for
> performance-critical scenarios, like SQL engines.
>

Are they? Who is asking that? I haven't heard that feedback at all and we
use the Arrow APIs extensively in Dremio and compete very well with other
SQL engines. The APIs were designed with the perspective that they need to
protect themselves in the context of the JVM so that a random user doesn't
hurt themselves. It sounds like maybe you don't agree with that.

It would be good for you to outline the 30 methods you see as being called
in FloatVector.get method.

In general, I think we should be more focused on the compiled code once it
has been optimized, not the methods. Have you looked at the assembly for
this method that the JIT outputs? The get method should collapse to a very
small number of instructions. If it isn't, we should address that. Have you
done that analysis? Has disabling the bounds checking addressed the issue
for you?


> 3. Can we live without Arrow, and just directly access the off-heap memory
> (e.g. by the UNSAFE instance)?
>
> I guess the answer is absolutely, yes.
> Parth is doing this (bypassing Arrow API) with Drill, and this is exactly
> what we are doing with Flink. My point is that, providing light-weight APIs
> will make it easier to use Arrow. Without such APIs, Parth may need to
> provide a library of Arrow wrappers in Drill, and we will need to provide a
> library of Arrow wrappers in Flink, and so on. That's redundant work, and
> it may reduce the popularity of Arrow.


How are you going to come up with a set of APIs that protect the user or
unroll checks? Or you just arguing that the user should not be protected?

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi Micah,

Thank you so much for your kind reply.

I don't like a parallel set of vector classes, either, and I believe a flag
to turn on and off boundary check is a good suggestion.

However, I am not sure if it is acceptable for performance-critical
scenarios, because anyway, we need to test the condition, and this test may
be evaluated many times (once or several times per row for SQL the engine),
which may impact performance.

I agree with you that we should try our best to hide such implementation
complexity, so that the users can simply create vectors through our facade
APIs, without being aware of the implementation details.

Best,
Liya Fan



On Sun, May 5, 2019 at 5:28 PM Fan Liya <li...@gmail.com> wrote:

> Hi all,
>
> Thank you so much for your attention and valuable feedback.
>
> Please let me try to address some common questions, before answering
> individual ones.
>
> 1. How much slower is the current Arrow API, compared to directly
> accessing off-heap memory?
>
> According to my (intuitive) experience in vectorizing Flink, the current
> API is much slower, at least one or two orders of magnitude slower.
> I am sorry I do not have the exact number. However, the conclusion can be
> expected to hold true: Parth's experience on Drill also confirms the
> conclusion.
> In fact, we are working on it. ARROW-5209 is about introducing performance
> benchmarks and once that is done, the number will be clear.
>
> 2. Why is current Arrow APIs so slow?
>
> I think the main reason is too many function calls. I believe each
> function call is highly optimized and only carries out simple work.
> However, the number of calls is large.
> The example in our online doc gives a simple example: a single call to
> Float8Vector.get method (which is an API fundamental enough) involves
> nearly 30 method calls. That is just too much overhead, especially for
> performance-critical scenarios, like SQL engines.
>
> 3. Can we live without Arrow, and just directly access the off-heap memory
> (e.g. by the UNSAFE instance)?
>
> I guess the answer is absolutely, yes.
> Parth is doing this (bypassing Arrow API) with Drill, and this is exactly
> what we are doing with Flink. My point is that, providing light-weight APIs
> will make it easier to use Arrow. Without such APIs, Parth may need to
> provide a library of Arrow wrappers in Drill, and we will need to provide a
> library of Arrow wrappers in Flink, and so on. That's redundant work, and
> it may reduce the popularity of Arrow.
>
> Best,
> Liya Fan
>
>
> On Fri, May 3, 2019 at 4:01 AM Jacques Nadeau <ja...@apache.org> wrote:
>
>> If someone wants to run without bounds checking, why don't they simply
>> flip
>> the system property? Are they seeing that code not get eliminated in if
>> they set that? I think people are optimizing the wrong things in this
>> discussion. The memory address is available. Per Parth's comments, if
>> you're working on a specific application, write directly to the memory.
>> That's the whole point of the reliable memory format. If something isn't
>> working right with the elimination of bounds checking, we can find another
>> solution to that and lets make that the ticket.
>>
>> My other comment on the PR4186 still stands: This doesn't have to be in
>> the
>> ArrowBuf interface. Because we're factoring out memory as a very simple
>> concept, it should be easy to simply create a wrapper object that provides
>> this functionality with no impact on performance. We specifically expose
>> the memory addressed directly for exactly this type of use. The reality
>> is:
>> if you want unsafe access, that basically means you don't want guardrails.
>> Direct memory access is the simplest/cleanest way to expose exactly that.
>>
>> On Thu, May 2, 2019 at 8:18 AM Siddharth Teotia <si...@dremio.com>
>> wrote:
>>
>> > Looks like there are 2 PRs for this work --
>> > https://github.com/apache/arrow/pull/4186 this PR adds new
>> get<type>Unsafe
>> > type APIs to ArrowBuf that don't do checkIndex() before calling
>> > PlatformDependent.get(memory address). So the access will go through
>> > vector.get() -> buffer.get() -> PlatformDependent.get() -> UNSAFE.get
>> which
>> > is what we do today but without doing any bounds checking
>> >
>> > I believe the proposal suggested here and the WIP PR --
>> > https://github.com/apache/arrow/pull/4212 adds new versions of vectors
>> > where the call to vector.get() bypasses the call to ArrowBuf and
>> directly
>> > invokes PlatformDependent with absolute address at which we want to
>> > read/write. Correct? Looks like the call to arrowbuf is still needed to
>> get
>> > the starting address of buffer before computing the absolute address
>> >
>> > I am wondering if much of the overhead is coming from conditions and
>> > branches inside bound checking or just the chain of method calls? If it
>> is
>> > bounds checking then I think the first PR would suffice probably.
>> >
>> > On Tue, Apr 30, 2019 at 9:46 AM Parth Chandra <pa...@apache.org>
>> wrote:
>> >
>> > > FWIW, in Drill's Value Vector code, we found that bounds checking was
>> a
>> > > major performance bottleneck in operators that wrote to vectors.
>> Scans,
>> > as
>> > > a result, we particularly affected. Another bottleneck was the
>> zeroing of
>> > > vectors.
>> > > There were many unnecessary bounds checks. For example in a varchar
>> > vector,
>> > > there is one check while writing the data, one while writing the
>> validity
>> > > bit, one more in the buffer allocator for the data buffer, one more in
>> > the
>> > > buffer allocator for the validity bit buffer, one more each in the
>> > > underlying ByteBuf implementation. It gets worse with repeated/array
>> > types.
>> > > Some code paths in Drill were optimized to get rid of these bounds
>> checks
>> > > (eventually I suppose all of them will be updated). The approach was
>> to
>> > > bypass the ValueVector API and write directly to the Drill(/Arrow)Buf.
>> > > Writing to the memory address directly, as is being proposed by Liya
>> Fan,
>> > > was initially tried but did not have any measurable performance
>> > > improvements. BTW, writing to the memory address would also conflict
>> with
>> > > ARROW-3191.
>> > > Note that the performance tests were for Drill queries, not Vectors,
>> so
>> > > writing to memory directly may still have a noticeable performance
>> > benefit
>> > > for different use cases.
>> > > Sorry, I don't have actual numbers with me to share and I'm not sure
>> how
>> > > much Arrow has diverged from the original Drill implementation, but
>> the
>> > > Drill experience would suggest that this proposal certainly has merit.
>> > >
>> > > Parth
>> > >
>> > > On Mon, Apr 29, 2019 at 11:18 AM Wes McKinney <we...@gmail.com>
>> > wrote:
>> > >
>> > > > I'm also curious which APIs are particularly problematic for
>> > > > performance. In ARROW-1833 [1] and some related discussions there
>> was
>> > > > the suggestion of adding methods like getUnsafe, so this would be
>> like
>> > > > get(i) [2] but without checking the validity bitmap
>> > > >
>> > > > [1] : https://issues.apache.org/jira/browse/ARROW-1833
>> > > > [2]:
>> > > >
>> > >
>> >
>> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/Float8Vector.java#L99
>> > > >
>> > > > On Mon, Apr 29, 2019 at 1:05 PM Micah Kornfield <
>> emkornfield@gmail.com
>> > >
>> > > > wrote:
>> > > > >
>> > > > > Thanks for the design.   Personally, I'm not a huge fan of
>> creating a
>> > > > > parallel classes for every vector type, this ends up being
>> confusing
>> > > for
>> > > > > developers and adds a lot of boiler plate.  I wonder if you could
>> > use a
>> > > > > similar approach that the memory module uses for turning bounds
>> > > checking
>> > > > > on/off [1].
>> > > > >
>> > > > > Also, I think there was a comment on the JIRA, but are there any
>> > > > benchmarks
>> > > > > to show the expected improvements?  My limited understanding is
>> that
>> > > for
>> > > > > small methods the JVM's JIT should inline them anyways [2] , so
>> it is
>> > > not
>> > > > > clear how much this will improve performance.
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > > Micah
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
>> > > > > [2]
>> > > > >
>> > > >
>> > >
>> >
>> https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f
>> > > > >
>> > > > > On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <li...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > We are proposing a new set of APIs in Arrow - unsafe vector
>> APIs.
>> > The
>> > > > > > general ideas is attached below, and also accessible from our
>> > online
>> > > > > > document
>> > > > > > <
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
>> > > > >.
>> > > > > > Please give your valuable comments by directly commenting in our
>> > > online
>> > > > > > document
>> > > > > > <
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
>> > > > >,
>> > > > > > or relaying this email thread.
>> > > > > >
>> > > > > > Thank you so much in advance.
>> > > > > >
>> > > > > > Best,
>> > > > > > Liya Fan
>> > > > > >
>> > > > > > Support Fast/Unsafe Vector APIs for Arrow Background
>> > > > > >
>> > > > > > In our effort to support columnar data format in Apache Flink,
>> we
>> > > chose
>> > > > > > Apache Arrow as the basic data structure. Arrow greatly
>> simplifies
>> > > the
>> > > > > > support of the columnar data format. However, for many
>> scenarios,
>> > we
>> > > > find
>> > > > > > the performance unacceptable. Our investigation shows the
>> reason is
>> > > > that,
>> > > > > > there are too many redundant checks and computations in current
>> > Arrow
>> > > > API.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > For example, the following figures shows that in a single call
>> to
>> > > > > > Float8Vector.get(int) method (this is one of the most frequently
>> > used
>> > > > APIs
>> > > > > > in Flink computation),  there are 20+ method invocations.
>> > > > > >
>> > > > > >
>> > > > > > [image: image.png]
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > There are many other APIs with similar problems. The redundant
>> > checks
>> > > > and
>> > > > > > computations impact performance severely. According to our
>> > > evaluation,
>> > > > the
>> > > > > > performance may degrade by two or three orders of magnitude.
>> > > > > > Our Proposal
>> > > > > >
>> > > > > > For many scenarios, the checks can be avoided, if the
>> application
>> > > > > > developers can guarantee that all checks will pass. So our
>> proposal
>> > > is
>> > > > to
>> > > > > > provide some light-weight APIs. The APIs are also named *unsafe
>> > > APIs*,
>> > > > in
>> > > > > > the sense that that skip most of the checks (not safe) to
>> improve
>> > the
>> > > > > > performance.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > In the light-weight APIs, we only provide minimum checks, or
>> avoid
>> > > > checks
>> > > > > > at all. The application owner can still develop and debug their
>> > code
>> > > > using
>> > > > > > the original safe APIs. Once all bugs have been fixed, they can
>> > > switch
>> > > > to
>> > > > > > unsafe APIs in the final version of their products and enjoy the
>> > high
>> > > > > > performance.
>> > > > > > Our Design
>> > > > > >
>> > > > > > Our goal is to include unsafe vector APIs in Arrow code base,
>> and
>> > > allow
>> > > > > > our customers switching to the new unsafe APIs, without being
>> aware
>> > > of
>> > > > it,
>> > > > > > except for the high performance. To achieve this goal, we make
>> the
>> > > > > > following design choices:
>> > > > > > Vector Class Hierarchy
>> > > > > >
>> > > > > > Each unsafe vector is the subclass of the safe vector. For
>> example,
>> > > the
>> > > > > > unsafe Float8Vector is a subclass of
>> > > > org.apache.arrow.vector.Float8Vector:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > package org.apache.arrow.vector.unsafe;
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > public class Float8Vector extends
>> > > org.apache.arrow.vector.Float8Vector
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > So the safe vector acts as a façade of the unsafe vector, and
>> > through
>> > > > > > polymorphism, the users may not be aware of which type of vector
>> > > > he/she is
>> > > > > > working with. In addition, the common logics can be reused in
>> the
>> > > > unsafe
>> > > > > > vectors, and we only need to override get/set related methods.
>> > > > > > Vector Creation
>> > > > > >
>> > > > > > We use factory methods to create each type of vectors. Compared
>> > with
>> > > > > > vector constructors, the factory methods take one more
>> parameter,
>> > the
>> > > > > > vectorType:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > public class VectorFactory {
>> > > > > >
>> > > > > >   public static Float8Vector createFloat8Vector(VectorType
>> > > vectorType,
>> > > > > > String name, BufferAllocator allocator);
>> > > > > >
>> > > > > > }
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > VectorType is an enum to separate safe vectors from unsafe ones:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > public enum VectorType {
>> > > > > >
>> > > > > >   SAFE,
>> > > > > >
>> > > > > >   UNSAFE
>> > > > > >
>> > > > > > }
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > With the factory methods, the old way of creating vectors by
>> > > > constructors
>> > > > > > can be gradually depreciated.
>> > > > > > Vector Implementation
>> > > > > >
>> > > > > > As discussed above, unsafe vectors mainly override get/set
>> methods.
>> > > For
>> > > > > > get methods, we directly operate on the off-heap memory, without
>> > any
>> > > > check:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > public double get(int index) {
>> > > > > >
>> > > > > >     return
>> > > > > >
>> > > >
>> > >
>> >
>> Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
>> > > > > > + (index << TYPE_LOG2_WIDTH)));
>> > > > > >
>> > > > > > }
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Note that the PlatformDependent API is only 2 stack layers above
>> > the
>> > > > > > underlying UNSAFE method call.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > For set methods, we still need to set the validity bit. However,
>> > this
>> > > > is
>> > > > > > through an unsafe method that directly sets the bits without
>> > > checking:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >          public void set(int index, double value) {
>> > > > > >
>> > > > > >       UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer,
>> > > index);
>> > > > > >
>> > > > > > PlatformDependent.putLong(
>> > > > > >
>> > > > > >             valueBuffer.memoryAddress() + (index <<
>> > TYPE_LOG2_WIDTH),
>> > > > > > Double.doubleToRawLongBits(value));
>> > > > > >
>> > > > > > }
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe
>> > > version
>> > > > of
>> > > > > > BitVectorHelper.setValidityBitToOne that avoids checks.
>> > > > > >
>> > > > > >
>> > > > > > Test Cases
>> > > > > >
>> > > > > > We can reuse existing test cases by employing parameterized test
>> > > > classes
>> > > > > > to test both safe and unsafe vectors.
>> > > > > > Current Progress
>> > > > > >
>> > > > > > We have opened a JIRA for this work item FlINK-5200
>> > > > > > <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
>> > > > > > <https://github.com/apache/arrow/pull/4212> with initial
>> > > > implementations
>> > > > > > have been opened. We would appreciate if you could give some
>> > > comments.
>> > > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Fan Liya <li...@gmail.com>.

Hi all,

Thank you so much for your attention and valuable feedback.

Please let me try to address some common questions, before answering
individual ones.

1. How much slower is the current Arrow API, compared to directly accessing
off-heap memory?

According to my (intuitive) experience in vectorizing Flink, the current
API is much slower, at least one or two orders of magnitude slower.
I am sorry I do not have the exact number. However, the conclusion can be
expected to hold true: Parth's experience on Drill also confirms the
conclusion.
In fact, we are working on it. ARROW-5209 is about introducing performance
benchmarks and once that is done, the number will be clear.

2. Why is current Arrow APIs so slow?

I think the main reason is too many function calls. I believe each function
call is highly optimized and only carries out simple work. However, the
number of calls is large.
The example in our online doc gives a simple example: a single call to
Float8Vector.get method (which is an API fundamental enough) involves
nearly 30 method calls. That is just too much overhead, especially for
performance-critical scenarios, like SQL engines.

3. Can we live without Arrow, and just directly access the off-heap memory
(e.g. by the UNSAFE instance)?

I guess the answer is absolutely, yes.
Parth is doing this (bypassing Arrow API) with Drill, and this is exactly
what we are doing with Flink. My point is that, providing light-weight APIs
will make it easier to use Arrow. Without such APIs, Parth may need to
provide a library of Arrow wrappers in Drill, and we will need to provide a
library of Arrow wrappers in Flink, and so on. That's redundant work, and
it may reduce the popularity of Arrow.

Best,
Liya Fan


On Fri, May 3, 2019 at 4:01 AM Jacques Nadeau <ja...@apache.org> wrote:

> If someone wants to run without bounds checking, why don't they simply flip
> the system property? Are they seeing that code not get eliminated in if
> they set that? I think people are optimizing the wrong things in this
> discussion. The memory address is available. Per Parth's comments, if
> you're working on a specific application, write directly to the memory.
> That's the whole point of the reliable memory format. If something isn't
> working right with the elimination of bounds checking, we can find another
> solution to that and lets make that the ticket.
>
> My other comment on the PR4186 still stands: This doesn't have to be in the
> ArrowBuf interface. Because we're factoring out memory as a very simple
> concept, it should be easy to simply create a wrapper object that provides
> this functionality with no impact on performance. We specifically expose
> the memory addressed directly for exactly this type of use. The reality is:
> if you want unsafe access, that basically means you don't want guardrails.
> Direct memory access is the simplest/cleanest way to expose exactly that.
>
> On Thu, May 2, 2019 at 8:18 AM Siddharth Teotia <si...@dremio.com>
> wrote:
>
> > Looks like there are 2 PRs for this work --
> > https://github.com/apache/arrow/pull/4186 this PR adds new
> get<type>Unsafe
> > type APIs to ArrowBuf that don't do checkIndex() before calling
> > PlatformDependent.get(memory address). So the access will go through
> > vector.get() -> buffer.get() -> PlatformDependent.get() -> UNSAFE.get
> which
> > is what we do today but without doing any bounds checking
> >
> > I believe the proposal suggested here and the WIP PR --
> > https://github.com/apache/arrow/pull/4212 adds new versions of vectors
> > where the call to vector.get() bypasses the call to ArrowBuf and directly
> > invokes PlatformDependent with absolute address at which we want to
> > read/write. Correct? Looks like the call to arrowbuf is still needed to
> get
> > the starting address of buffer before computing the absolute address
> >
> > I am wondering if much of the overhead is coming from conditions and
> > branches inside bound checking or just the chain of method calls? If it
> is
> > bounds checking then I think the first PR would suffice probably.
> >
> > On Tue, Apr 30, 2019 at 9:46 AM Parth Chandra <pa...@apache.org> wrote:
> >
> > > FWIW, in Drill's Value Vector code, we found that bounds checking was a
> > > major performance bottleneck in operators that wrote to vectors. Scans,
> > as
> > > a result, we particularly affected. Another bottleneck was the zeroing
> of
> > > vectors.
> > > There were many unnecessary bounds checks. For example in a varchar
> > vector,
> > > there is one check while writing the data, one while writing the
> validity
> > > bit, one more in the buffer allocator for the data buffer, one more in
> > the
> > > buffer allocator for the validity bit buffer, one more each in the
> > > underlying ByteBuf implementation. It gets worse with repeated/array
> > types.
> > > Some code paths in Drill were optimized to get rid of these bounds
> checks
> > > (eventually I suppose all of them will be updated). The approach was to
> > > bypass the ValueVector API and write directly to the Drill(/Arrow)Buf.
> > > Writing to the memory address directly, as is being proposed by Liya
> Fan,
> > > was initially tried but did not have any measurable performance
> > > improvements. BTW, writing to the memory address would also conflict
> with
> > > ARROW-3191.
> > > Note that the performance tests were for Drill queries, not Vectors, so
> > > writing to memory directly may still have a noticeable performance
> > benefit
> > > for different use cases.
> > > Sorry, I don't have actual numbers with me to share and I'm not sure
> how
> > > much Arrow has diverged from the original Drill implementation, but the
> > > Drill experience would suggest that this proposal certainly has merit.
> > >
> > > Parth
> > >
> > > On Mon, Apr 29, 2019 at 11:18 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > >
> > > > I'm also curious which APIs are particularly problematic for
> > > > performance. In ARROW-1833 [1] and some related discussions there was
> > > > the suggestion of adding methods like getUnsafe, so this would be
> like
> > > > get(i) [2] but without checking the validity bitmap
> > > >
> > > > [1] : https://issues.apache.org/jira/browse/ARROW-1833
> > > > [2]:
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/Float8Vector.java#L99
> > > >
> > > > On Mon, Apr 29, 2019 at 1:05 PM Micah Kornfield <
> emkornfield@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > Thanks for the design.   Personally, I'm not a huge fan of
> creating a
> > > > > parallel classes for every vector type, this ends up being
> confusing
> > > for
> > > > > developers and adds a lot of boiler plate.  I wonder if you could
> > use a
> > > > > similar approach that the memory module uses for turning bounds
> > > checking
> > > > > on/off [1].
> > > > >
> > > > > Also, I think there was a comment on the JIRA, but are there any
> > > > benchmarks
> > > > > to show the expected improvements?  My limited understanding is
> that
> > > for
> > > > > small methods the JVM's JIT should inline them anyways [2] , so it
> is
> > > not
> > > > > clear how much this will improve performance.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f
> > > > >
> > > > > On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <li...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We are proposing a new set of APIs in Arrow - unsafe vector APIs.
> > The
> > > > > > general ideas is attached below, and also accessible from our
> > online
> > > > > > document
> > > > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> > > > >.
> > > > > > Please give your valuable comments by directly commenting in our
> > > online
> > > > > > document
> > > > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> > > > >,
> > > > > > or relaying this email thread.
> > > > > >
> > > > > > Thank you so much in advance.
> > > > > >
> > > > > > Best,
> > > > > > Liya Fan
> > > > > >
> > > > > > Support Fast/Unsafe Vector APIs for Arrow Background
> > > > > >
> > > > > > In our effort to support columnar data format in Apache Flink, we
> > > chose
> > > > > > Apache Arrow as the basic data structure. Arrow greatly
> simplifies
> > > the
> > > > > > support of the columnar data format. However, for many scenarios,
> > we
> > > > find
> > > > > > the performance unacceptable. Our investigation shows the reason
> is
> > > > that,
> > > > > > there are too many redundant checks and computations in current
> > Arrow
> > > > API.
> > > > > >
> > > > > >
> > > > > >
> > > > > > For example, the following figures shows that in a single call to
> > > > > > Float8Vector.get(int) method (this is one of the most frequently
> > used
> > > > APIs
> > > > > > in Flink computation),  there are 20+ method invocations.
> > > > > >
> > > > > >
> > > > > > [image: image.png]
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > There are many other APIs with similar problems. The redundant
> > checks
> > > > and
> > > > > > computations impact performance severely. According to our
> > > evaluation,
> > > > the
> > > > > > performance may degrade by two or three orders of magnitude.
> > > > > > Our Proposal
> > > > > >
> > > > > > For many scenarios, the checks can be avoided, if the application
> > > > > > developers can guarantee that all checks will pass. So our
> proposal
> > > is
> > > > to
> > > > > > provide some light-weight APIs. The APIs are also named *unsafe
> > > APIs*,
> > > > in
> > > > > > the sense that that skip most of the checks (not safe) to improve
> > the
> > > > > > performance.
> > > > > >
> > > > > >
> > > > > >
> > > > > > In the light-weight APIs, we only provide minimum checks, or
> avoid
> > > > checks
> > > > > > at all. The application owner can still develop and debug their
> > code
> > > > using
> > > > > > the original safe APIs. Once all bugs have been fixed, they can
> > > switch
> > > > to
> > > > > > unsafe APIs in the final version of their products and enjoy the
> > high
> > > > > > performance.
> > > > > > Our Design
> > > > > >
> > > > > > Our goal is to include unsafe vector APIs in Arrow code base, and
> > > allow
> > > > > > our customers switching to the new unsafe APIs, without being
> aware
> > > of
> > > > it,
> > > > > > except for the high performance. To achieve this goal, we make
> the
> > > > > > following design choices:
> > > > > > Vector Class Hierarchy
> > > > > >
> > > > > > Each unsafe vector is the subclass of the safe vector. For
> example,
> > > the
> > > > > > unsafe Float8Vector is a subclass of
> > > > org.apache.arrow.vector.Float8Vector:
> > > > > >
> > > > > >
> > > > > >
> > > > > > package org.apache.arrow.vector.unsafe;
> > > > > >
> > > > > >
> > > > > >
> > > > > > public class Float8Vector extends
> > > org.apache.arrow.vector.Float8Vector
> > > > > >
> > > > > >
> > > > > >
> > > > > > So the safe vector acts as a façade of the unsafe vector, and
> > through
> > > > > > polymorphism, the users may not be aware of which type of vector
> > > > he/she is
> > > > > > working with. In addition, the common logics can be reused in the
> > > > unsafe
> > > > > > vectors, and we only need to override get/set related methods.
> > > > > > Vector Creation
> > > > > >
> > > > > > We use factory methods to create each type of vectors. Compared
> > with
> > > > > > vector constructors, the factory methods take one more parameter,
> > the
> > > > > > vectorType:
> > > > > >
> > > > > >
> > > > > >
> > > > > > public class VectorFactory {
> > > > > >
> > > > > >   public static Float8Vector createFloat8Vector(VectorType
> > > vectorType,
> > > > > > String name, BufferAllocator allocator);
> > > > > >
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > > VectorType is an enum to separate safe vectors from unsafe ones:
> > > > > >
> > > > > >
> > > > > >
> > > > > > public enum VectorType {
> > > > > >
> > > > > >   SAFE,
> > > > > >
> > > > > >   UNSAFE
> > > > > >
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > > With the factory methods, the old way of creating vectors by
> > > > constructors
> > > > > > can be gradually depreciated.
> > > > > > Vector Implementation
> > > > > >
> > > > > > As discussed above, unsafe vectors mainly override get/set
> methods.
> > > For
> > > > > > get methods, we directly operate on the off-heap memory, without
> > any
> > > > check:
> > > > > >
> > > > > >
> > > > > >
> > > > > > public double get(int index) {
> > > > > >
> > > > > >     return
> > > > > >
> > > >
> > >
> >
> Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
> > > > > > + (index << TYPE_LOG2_WIDTH)));
> > > > > >
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > > Note that the PlatformDependent API is only 2 stack layers above
> > the
> > > > > > underlying UNSAFE method call.
> > > > > >
> > > > > >
> > > > > >
> > > > > > For set methods, we still need to set the validity bit. However,
> > this
> > > > is
> > > > > > through an unsafe method that directly sets the bits without
> > > checking:
> > > > > >
> > > > > >
> > > > > >
> > > > > >          public void set(int index, double value) {
> > > > > >
> > > > > >       UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer,
> > > index);
> > > > > >
> > > > > > PlatformDependent.putLong(
> > > > > >
> > > > > >             valueBuffer.memoryAddress() + (index <<
> > TYPE_LOG2_WIDTH),
> > > > > > Double.doubleToRawLongBits(value));
> > > > > >
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > > Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe
> > > version
> > > > of
> > > > > > BitVectorHelper.setValidityBitToOne that avoids checks.
> > > > > >
> > > > > >
> > > > > > Test Cases
> > > > > >
> > > > > > We can reuse existing test cases by employing parameterized test
> > > > classes
> > > > > > to test both safe and unsafe vectors.
> > > > > > Current Progress
> > > > > >
> > > > > > We have opened a JIRA for this work item FlINK-5200
> > > > > > <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
> > > > > > <https://github.com/apache/arrow/pull/4212> with initial
> > > > implementations
> > > > > > have been opened. We would appreciate if you could give some
> > > comments.
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Jacques Nadeau <ja...@apache.org>.

If someone wants to run without bounds checking, why don't they simply flip
the system property? Are they seeing that code not get eliminated in if
they set that? I think people are optimizing the wrong things in this
discussion. The memory address is available. Per Parth's comments, if
you're working on a specific application, write directly to the memory.
That's the whole point of the reliable memory format. If something isn't
working right with the elimination of bounds checking, we can find another
solution to that and lets make that the ticket.

My other comment on the PR4186 still stands: This doesn't have to be in the
ArrowBuf interface. Because we're factoring out memory as a very simple
concept, it should be easy to simply create a wrapper object that provides
this functionality with no impact on performance. We specifically expose
the memory addressed directly for exactly this type of use. The reality is:
if you want unsafe access, that basically means you don't want guardrails.
Direct memory access is the simplest/cleanest way to expose exactly that.

On Thu, May 2, 2019 at 8:18 AM Siddharth Teotia <si...@dremio.com>
wrote:

> Looks like there are 2 PRs for this work --
> https://github.com/apache/arrow/pull/4186 this PR adds new get<type>Unsafe
> type APIs to ArrowBuf that don't do checkIndex() before calling
> PlatformDependent.get(memory address). So the access will go through
> vector.get() -> buffer.get() -> PlatformDependent.get() -> UNSAFE.get which
> is what we do today but without doing any bounds checking
>
> I believe the proposal suggested here and the WIP PR --
> https://github.com/apache/arrow/pull/4212 adds new versions of vectors
> where the call to vector.get() bypasses the call to ArrowBuf and directly
> invokes PlatformDependent with absolute address at which we want to
> read/write. Correct? Looks like the call to arrowbuf is still needed to get
> the starting address of buffer before computing the absolute address
>
> I am wondering if much of the overhead is coming from conditions and
> branches inside bound checking or just the chain of method calls? If it is
> bounds checking then I think the first PR would suffice probably.
>
> On Tue, Apr 30, 2019 at 9:46 AM Parth Chandra <pa...@apache.org> wrote:
>
> > FWIW, in Drill's Value Vector code, we found that bounds checking was a
> > major performance bottleneck in operators that wrote to vectors. Scans,
> as
> > a result, we particularly affected. Another bottleneck was the zeroing of
> > vectors.
> > There were many unnecessary bounds checks. For example in a varchar
> vector,
> > there is one check while writing the data, one while writing the validity
> > bit, one more in the buffer allocator for the data buffer, one more in
> the
> > buffer allocator for the validity bit buffer, one more each in the
> > underlying ByteBuf implementation. It gets worse with repeated/array
> types.
> > Some code paths in Drill were optimized to get rid of these bounds checks
> > (eventually I suppose all of them will be updated). The approach was to
> > bypass the ValueVector API and write directly to the Drill(/Arrow)Buf.
> > Writing to the memory address directly, as is being proposed by Liya Fan,
> > was initially tried but did not have any measurable performance
> > improvements. BTW, writing to the memory address would also conflict with
> > ARROW-3191.
> > Note that the performance tests were for Drill queries, not Vectors, so
> > writing to memory directly may still have a noticeable performance
> benefit
> > for different use cases.
> > Sorry, I don't have actual numbers with me to share and I'm not sure how
> > much Arrow has diverged from the original Drill implementation, but the
> > Drill experience would suggest that this proposal certainly has merit.
> >
> > Parth
> >
> > On Mon, Apr 29, 2019 at 11:18 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > I'm also curious which APIs are particularly problematic for
> > > performance. In ARROW-1833 [1] and some related discussions there was
> > > the suggestion of adding methods like getUnsafe, so this would be like
> > > get(i) [2] but without checking the validity bitmap
> > >
> > > [1] : https://issues.apache.org/jira/browse/ARROW-1833
> > > [2]:
> > >
> >
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/Float8Vector.java#L99
> > >
> > > On Mon, Apr 29, 2019 at 1:05 PM Micah Kornfield <emkornfield@gmail.com
> >
> > > wrote:
> > > >
> > > > Thanks for the design.   Personally, I'm not a huge fan of creating a
> > > > parallel classes for every vector type, this ends up being confusing
> > for
> > > > developers and adds a lot of boiler plate.  I wonder if you could
> use a
> > > > similar approach that the memory module uses for turning bounds
> > checking
> > > > on/off [1].
> > > >
> > > > Also, I think there was a comment on the JIRA, but are there any
> > > benchmarks
> > > > to show the expected improvements?  My limited understanding is that
> > for
> > > > small methods the JVM's JIT should inline them anyways [2] , so it is
> > not
> > > > clear how much this will improve performance.
> > > >
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
> > > > [2]
> > > >
> > >
> >
> https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f
> > > >
> > > > On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <li...@gmail.com>
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > We are proposing a new set of APIs in Arrow - unsafe vector APIs.
> The
> > > > > general ideas is attached below, and also accessible from our
> online
> > > > > document
> > > > > <
> > >
> >
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> > > >.
> > > > > Please give your valuable comments by directly commenting in our
> > online
> > > > > document
> > > > > <
> > >
> >
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> > > >,
> > > > > or relaying this email thread.
> > > > >
> > > > > Thank you so much in advance.
> > > > >
> > > > > Best,
> > > > > Liya Fan
> > > > >
> > > > > Support Fast/Unsafe Vector APIs for Arrow Background
> > > > >
> > > > > In our effort to support columnar data format in Apache Flink, we
> > chose
> > > > > Apache Arrow as the basic data structure. Arrow greatly simplifies
> > the
> > > > > support of the columnar data format. However, for many scenarios,
> we
> > > find
> > > > > the performance unacceptable. Our investigation shows the reason is
> > > that,
> > > > > there are too many redundant checks and computations in current
> Arrow
> > > API.
> > > > >
> > > > >
> > > > >
> > > > > For example, the following figures shows that in a single call to
> > > > > Float8Vector.get(int) method (this is one of the most frequently
> used
> > > APIs
> > > > > in Flink computation),  there are 20+ method invocations.
> > > > >
> > > > >
> > > > > [image: image.png]
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > There are many other APIs with similar problems. The redundant
> checks
> > > and
> > > > > computations impact performance severely. According to our
> > evaluation,
> > > the
> > > > > performance may degrade by two or three orders of magnitude.
> > > > > Our Proposal
> > > > >
> > > > > For many scenarios, the checks can be avoided, if the application
> > > > > developers can guarantee that all checks will pass. So our proposal
> > is
> > > to
> > > > > provide some light-weight APIs. The APIs are also named *unsafe
> > APIs*,
> > > in
> > > > > the sense that that skip most of the checks (not safe) to improve
> the
> > > > > performance.
> > > > >
> > > > >
> > > > >
> > > > > In the light-weight APIs, we only provide minimum checks, or avoid
> > > checks
> > > > > at all. The application owner can still develop and debug their
> code
> > > using
> > > > > the original safe APIs. Once all bugs have been fixed, they can
> > switch
> > > to
> > > > > unsafe APIs in the final version of their products and enjoy the
> high
> > > > > performance.
> > > > > Our Design
> > > > >
> > > > > Our goal is to include unsafe vector APIs in Arrow code base, and
> > allow
> > > > > our customers switching to the new unsafe APIs, without being aware
> > of
> > > it,
> > > > > except for the high performance. To achieve this goal, we make the
> > > > > following design choices:
> > > > > Vector Class Hierarchy
> > > > >
> > > > > Each unsafe vector is the subclass of the safe vector. For example,
> > the
> > > > > unsafe Float8Vector is a subclass of
> > > org.apache.arrow.vector.Float8Vector:
> > > > >
> > > > >
> > > > >
> > > > > package org.apache.arrow.vector.unsafe;
> > > > >
> > > > >
> > > > >
> > > > > public class Float8Vector extends
> > org.apache.arrow.vector.Float8Vector
> > > > >
> > > > >
> > > > >
> > > > > So the safe vector acts as a façade of the unsafe vector, and
> through
> > > > > polymorphism, the users may not be aware of which type of vector
> > > he/she is
> > > > > working with. In addition, the common logics can be reused in the
> > > unsafe
> > > > > vectors, and we only need to override get/set related methods.
> > > > > Vector Creation
> > > > >
> > > > > We use factory methods to create each type of vectors. Compared
> with
> > > > > vector constructors, the factory methods take one more parameter,
> the
> > > > > vectorType:
> > > > >
> > > > >
> > > > >
> > > > > public class VectorFactory {
> > > > >
> > > > >   public static Float8Vector createFloat8Vector(VectorType
> > vectorType,
> > > > > String name, BufferAllocator allocator);
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > VectorType is an enum to separate safe vectors from unsafe ones:
> > > > >
> > > > >
> > > > >
> > > > > public enum VectorType {
> > > > >
> > > > >   SAFE,
> > > > >
> > > > >   UNSAFE
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > With the factory methods, the old way of creating vectors by
> > > constructors
> > > > > can be gradually depreciated.
> > > > > Vector Implementation
> > > > >
> > > > > As discussed above, unsafe vectors mainly override get/set methods.
> > For
> > > > > get methods, we directly operate on the off-heap memory, without
> any
> > > check:
> > > > >
> > > > >
> > > > >
> > > > > public double get(int index) {
> > > > >
> > > > >     return
> > > > >
> > >
> >
> Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
> > > > > + (index << TYPE_LOG2_WIDTH)));
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > Note that the PlatformDependent API is only 2 stack layers above
> the
> > > > > underlying UNSAFE method call.
> > > > >
> > > > >
> > > > >
> > > > > For set methods, we still need to set the validity bit. However,
> this
> > > is
> > > > > through an unsafe method that directly sets the bits without
> > checking:
> > > > >
> > > > >
> > > > >
> > > > >          public void set(int index, double value) {
> > > > >
> > > > >       UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer,
> > index);
> > > > >
> > > > > PlatformDependent.putLong(
> > > > >
> > > > >             valueBuffer.memoryAddress() + (index <<
> TYPE_LOG2_WIDTH),
> > > > > Double.doubleToRawLongBits(value));
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe
> > version
> > > of
> > > > > BitVectorHelper.setValidityBitToOne that avoids checks.
> > > > >
> > > > >
> > > > > Test Cases
> > > > >
> > > > > We can reuse existing test cases by employing parameterized test
> > > classes
> > > > > to test both safe and unsafe vectors.
> > > > > Current Progress
> > > > >
> > > > > We have opened a JIRA for this work item FlINK-5200
> > > > > <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
> > > > > <https://github.com/apache/arrow/pull/4212> with initial
> > > implementations
> > > > > have been opened. We would appreciate if you could give some
> > comments.
> > > > >
> > >
> >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Siddharth Teotia <si...@dremio.com>.

Looks like there are 2 PRs for this work --
https://github.com/apache/arrow/pull/4186 this PR adds new get<type>Unsafe
type APIs to ArrowBuf that don't do checkIndex() before calling
PlatformDependent.get(memory address). So the access will go through
vector.get() -> buffer.get() -> PlatformDependent.get() -> UNSAFE.get which
is what we do today but without doing any bounds checking

I believe the proposal suggested here and the WIP PR --
https://github.com/apache/arrow/pull/4212 adds new versions of vectors
where the call to vector.get() bypasses the call to ArrowBuf and directly
invokes PlatformDependent with absolute address at which we want to
read/write. Correct? Looks like the call to arrowbuf is still needed to get
the starting address of buffer before computing the absolute address

I am wondering if much of the overhead is coming from conditions and
branches inside bound checking or just the chain of method calls? If it is
bounds checking then I think the first PR would suffice probably.

On Tue, Apr 30, 2019 at 9:46 AM Parth Chandra <pa...@apache.org> wrote:

> FWIW, in Drill's Value Vector code, we found that bounds checking was a
> major performance bottleneck in operators that wrote to vectors. Scans, as
> a result, we particularly affected. Another bottleneck was the zeroing of
> vectors.
> There were many unnecessary bounds checks. For example in a varchar vector,
> there is one check while writing the data, one while writing the validity
> bit, one more in the buffer allocator for the data buffer, one more in the
> buffer allocator for the validity bit buffer, one more each in the
> underlying ByteBuf implementation. It gets worse with repeated/array types.
> Some code paths in Drill were optimized to get rid of these bounds checks
> (eventually I suppose all of them will be updated). The approach was to
> bypass the ValueVector API and write directly to the Drill(/Arrow)Buf.
> Writing to the memory address directly, as is being proposed by Liya Fan,
> was initially tried but did not have any measurable performance
> improvements. BTW, writing to the memory address would also conflict with
> ARROW-3191.
> Note that the performance tests were for Drill queries, not Vectors, so
> writing to memory directly may still have a noticeable performance benefit
> for different use cases.
> Sorry, I don't have actual numbers with me to share and I'm not sure how
> much Arrow has diverged from the original Drill implementation, but the
> Drill experience would suggest that this proposal certainly has merit.
>
> Parth
>
> On Mon, Apr 29, 2019 at 11:18 AM Wes McKinney <we...@gmail.com> wrote:
>
> > I'm also curious which APIs are particularly problematic for
> > performance. In ARROW-1833 [1] and some related discussions there was
> > the suggestion of adding methods like getUnsafe, so this would be like
> > get(i) [2] but without checking the validity bitmap
> >
> > [1] : https://issues.apache.org/jira/browse/ARROW-1833
> > [2]:
> >
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/Float8Vector.java#L99
> >
> > On Mon, Apr 29, 2019 at 1:05 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> > >
> > > Thanks for the design.   Personally, I'm not a huge fan of creating a
> > > parallel classes for every vector type, this ends up being confusing
> for
> > > developers and adds a lot of boiler plate.  I wonder if you could use a
> > > similar approach that the memory module uses for turning bounds
> checking
> > > on/off [1].
> > >
> > > Also, I think there was a comment on the JIRA, but are there any
> > benchmarks
> > > to show the expected improvements?  My limited understanding is that
> for
> > > small methods the JVM's JIT should inline them anyways [2] , so it is
> not
> > > clear how much this will improve performance.
> > >
> > >
> > > Thanks,
> > > Micah
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
> > > [2]
> > >
> >
> https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f
> > >
> > > On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <li...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We are proposing a new set of APIs in Arrow - unsafe vector APIs. The
> > > > general ideas is attached below, and also accessible from our online
> > > > document
> > > > <
> >
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> > >.
> > > > Please give your valuable comments by directly commenting in our
> online
> > > > document
> > > > <
> >
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> > >,
> > > > or relaying this email thread.
> > > >
> > > > Thank you so much in advance.
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > > Support Fast/Unsafe Vector APIs for Arrow Background
> > > >
> > > > In our effort to support columnar data format in Apache Flink, we
> chose
> > > > Apache Arrow as the basic data structure. Arrow greatly simplifies
> the
> > > > support of the columnar data format. However, for many scenarios, we
> > find
> > > > the performance unacceptable. Our investigation shows the reason is
> > that,
> > > > there are too many redundant checks and computations in current Arrow
> > API.
> > > >
> > > >
> > > >
> > > > For example, the following figures shows that in a single call to
> > > > Float8Vector.get(int) method (this is one of the most frequently used
> > APIs
> > > > in Flink computation),  there are 20+ method invocations.
> > > >
> > > >
> > > > [image: image.png]
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > There are many other APIs with similar problems. The redundant checks
> > and
> > > > computations impact performance severely. According to our
> evaluation,
> > the
> > > > performance may degrade by two or three orders of magnitude.
> > > > Our Proposal
> > > >
> > > > For many scenarios, the checks can be avoided, if the application
> > > > developers can guarantee that all checks will pass. So our proposal
> is
> > to
> > > > provide some light-weight APIs. The APIs are also named *unsafe
> APIs*,
> > in
> > > > the sense that that skip most of the checks (not safe) to improve the
> > > > performance.
> > > >
> > > >
> > > >
> > > > In the light-weight APIs, we only provide minimum checks, or avoid
> > checks
> > > > at all. The application owner can still develop and debug their code
> > using
> > > > the original safe APIs. Once all bugs have been fixed, they can
> switch
> > to
> > > > unsafe APIs in the final version of their products and enjoy the high
> > > > performance.
> > > > Our Design
> > > >
> > > > Our goal is to include unsafe vector APIs in Arrow code base, and
> allow
> > > > our customers switching to the new unsafe APIs, without being aware
> of
> > it,
> > > > except for the high performance. To achieve this goal, we make the
> > > > following design choices:
> > > > Vector Class Hierarchy
> > > >
> > > > Each unsafe vector is the subclass of the safe vector. For example,
> the
> > > > unsafe Float8Vector is a subclass of
> > org.apache.arrow.vector.Float8Vector:
> > > >
> > > >
> > > >
> > > > package org.apache.arrow.vector.unsafe;
> > > >
> > > >
> > > >
> > > > public class Float8Vector extends
> org.apache.arrow.vector.Float8Vector
> > > >
> > > >
> > > >
> > > > So the safe vector acts as a façade of the unsafe vector, and through
> > > > polymorphism, the users may not be aware of which type of vector
> > he/she is
> > > > working with. In addition, the common logics can be reused in the
> > unsafe
> > > > vectors, and we only need to override get/set related methods.
> > > > Vector Creation
> > > >
> > > > We use factory methods to create each type of vectors. Compared with
> > > > vector constructors, the factory methods take one more parameter, the
> > > > vectorType:
> > > >
> > > >
> > > >
> > > > public class VectorFactory {
> > > >
> > > >   public static Float8Vector createFloat8Vector(VectorType
> vectorType,
> > > > String name, BufferAllocator allocator);
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > > VectorType is an enum to separate safe vectors from unsafe ones:
> > > >
> > > >
> > > >
> > > > public enum VectorType {
> > > >
> > > >   SAFE,
> > > >
> > > >   UNSAFE
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > > With the factory methods, the old way of creating vectors by
> > constructors
> > > > can be gradually depreciated.
> > > > Vector Implementation
> > > >
> > > > As discussed above, unsafe vectors mainly override get/set methods.
> For
> > > > get methods, we directly operate on the off-heap memory, without any
> > check:
> > > >
> > > >
> > > >
> > > > public double get(int index) {
> > > >
> > > >     return
> > > >
> >
> Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
> > > > + (index << TYPE_LOG2_WIDTH)));
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > > Note that the PlatformDependent API is only 2 stack layers above the
> > > > underlying UNSAFE method call.
> > > >
> > > >
> > > >
> > > > For set methods, we still need to set the validity bit. However, this
> > is
> > > > through an unsafe method that directly sets the bits without
> checking:
> > > >
> > > >
> > > >
> > > >          public void set(int index, double value) {
> > > >
> > > >       UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer,
> index);
> > > >
> > > > PlatformDependent.putLong(
> > > >
> > > >             valueBuffer.memoryAddress() + (index << TYPE_LOG2_WIDTH),
> > > > Double.doubleToRawLongBits(value));
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > > Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe
> version
> > of
> > > > BitVectorHelper.setValidityBitToOne that avoids checks.
> > > >
> > > >
> > > > Test Cases
> > > >
> > > > We can reuse existing test cases by employing parameterized test
> > classes
> > > > to test both safe and unsafe vectors.
> > > > Current Progress
> > > >
> > > > We have opened a JIRA for this work item FlINK-5200
> > > > <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
> > > > <https://github.com/apache/arrow/pull/4212> with initial
> > implementations
> > > > have been opened. We would appreciate if you could give some
> comments.
> > > >
> >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Parth Chandra <pa...@apache.org>.

FWIW, in Drill's Value Vector code, we found that bounds checking was a
major performance bottleneck in operators that wrote to vectors. Scans, as
a result, we particularly affected. Another bottleneck was the zeroing of
vectors.
There were many unnecessary bounds checks. For example in a varchar vector,
there is one check while writing the data, one while writing the validity
bit, one more in the buffer allocator for the data buffer, one more in the
buffer allocator for the validity bit buffer, one more each in the
underlying ByteBuf implementation. It gets worse with repeated/array types.
Some code paths in Drill were optimized to get rid of these bounds checks
(eventually I suppose all of them will be updated). The approach was to
bypass the ValueVector API and write directly to the Drill(/Arrow)Buf.
Writing to the memory address directly, as is being proposed by Liya Fan,
was initially tried but did not have any measurable performance
improvements. BTW, writing to the memory address would also conflict with
ARROW-3191.
Note that the performance tests were for Drill queries, not Vectors, so
writing to memory directly may still have a noticeable performance benefit
for different use cases.
Sorry, I don't have actual numbers with me to share and I'm not sure how
much Arrow has diverged from the original Drill implementation, but the
Drill experience would suggest that this proposal certainly has merit.

Parth

On Mon, Apr 29, 2019 at 11:18 AM Wes McKinney <we...@gmail.com> wrote:

> I'm also curious which APIs are particularly problematic for
> performance. In ARROW-1833 [1] and some related discussions there was
> the suggestion of adding methods like getUnsafe, so this would be like
> get(i) [2] but without checking the validity bitmap
>
> [1] : https://issues.apache.org/jira/browse/ARROW-1833
> [2]:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/Float8Vector.java#L99
>
> On Mon, Apr 29, 2019 at 1:05 PM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Thanks for the design.   Personally, I'm not a huge fan of creating a
> > parallel classes for every vector type, this ends up being confusing for
> > developers and adds a lot of boiler plate.  I wonder if you could use a
> > similar approach that the memory module uses for turning bounds checking
> > on/off [1].
> >
> > Also, I think there was a comment on the JIRA, but are there any
> benchmarks
> > to show the expected improvements?  My limited understanding is that for
> > small methods the JVM's JIT should inline them anyways [2] , so it is not
> > clear how much this will improve performance.
> >
> >
> > Thanks,
> > Micah
> >
> >
> > [1]
> >
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
> > [2]
> >
> https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f
> >
> > On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <li...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > We are proposing a new set of APIs in Arrow - unsafe vector APIs. The
> > > general ideas is attached below, and also accessible from our online
> > > document
> > > <
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> >.
> > > Please give your valuable comments by directly commenting in our online
> > > document
> > > <
> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing
> >,
> > > or relaying this email thread.
> > >
> > > Thank you so much in advance.
> > >
> > > Best,
> > > Liya Fan
> > >
> > > Support Fast/Unsafe Vector APIs for Arrow Background
> > >
> > > In our effort to support columnar data format in Apache Flink, we chose
> > > Apache Arrow as the basic data structure. Arrow greatly simplifies the
> > > support of the columnar data format. However, for many scenarios, we
> find
> > > the performance unacceptable. Our investigation shows the reason is
> that,
> > > there are too many redundant checks and computations in current Arrow
> API.
> > >
> > >
> > >
> > > For example, the following figures shows that in a single call to
> > > Float8Vector.get(int) method (this is one of the most frequently used
> APIs
> > > in Flink computation),  there are 20+ method invocations.
> > >
> > >
> > > [image: image.png]
> > >
> > >
> > >
> > >
> > >
> > > There are many other APIs with similar problems. The redundant checks
> and
> > > computations impact performance severely. According to our evaluation,
> the
> > > performance may degrade by two or three orders of magnitude.
> > > Our Proposal
> > >
> > > For many scenarios, the checks can be avoided, if the application
> > > developers can guarantee that all checks will pass. So our proposal is
> to
> > > provide some light-weight APIs. The APIs are also named *unsafe APIs*,
> in
> > > the sense that that skip most of the checks (not safe) to improve the
> > > performance.
> > >
> > >
> > >
> > > In the light-weight APIs, we only provide minimum checks, or avoid
> checks
> > > at all. The application owner can still develop and debug their code
> using
> > > the original safe APIs. Once all bugs have been fixed, they can switch
> to
> > > unsafe APIs in the final version of their products and enjoy the high
> > > performance.
> > > Our Design
> > >
> > > Our goal is to include unsafe vector APIs in Arrow code base, and allow
> > > our customers switching to the new unsafe APIs, without being aware of
> it,
> > > except for the high performance. To achieve this goal, we make the
> > > following design choices:
> > > Vector Class Hierarchy
> > >
> > > Each unsafe vector is the subclass of the safe vector. For example, the
> > > unsafe Float8Vector is a subclass of
> org.apache.arrow.vector.Float8Vector:
> > >
> > >
> > >
> > > package org.apache.arrow.vector.unsafe;
> > >
> > >
> > >
> > > public class Float8Vector extends org.apache.arrow.vector.Float8Vector
> > >
> > >
> > >
> > > So the safe vector acts as a façade of the unsafe vector, and through
> > > polymorphism, the users may not be aware of which type of vector
> he/she is
> > > working with. In addition, the common logics can be reused in the
> unsafe
> > > vectors, and we only need to override get/set related methods.
> > > Vector Creation
> > >
> > > We use factory methods to create each type of vectors. Compared with
> > > vector constructors, the factory methods take one more parameter, the
> > > vectorType:
> > >
> > >
> > >
> > > public class VectorFactory {
> > >
> > >   public static Float8Vector createFloat8Vector(VectorType vectorType,
> > > String name, BufferAllocator allocator);
> > >
> > > }
> > >
> > >
> > >
> > > VectorType is an enum to separate safe vectors from unsafe ones:
> > >
> > >
> > >
> > > public enum VectorType {
> > >
> > >   SAFE,
> > >
> > >   UNSAFE
> > >
> > > }
> > >
> > >
> > >
> > > With the factory methods, the old way of creating vectors by
> constructors
> > > can be gradually depreciated.
> > > Vector Implementation
> > >
> > > As discussed above, unsafe vectors mainly override get/set methods. For
> > > get methods, we directly operate on the off-heap memory, without any
> check:
> > >
> > >
> > >
> > > public double get(int index) {
> > >
> > >     return
> > >
> Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
> > > + (index << TYPE_LOG2_WIDTH)));
> > >
> > > }
> > >
> > >
> > >
> > > Note that the PlatformDependent API is only 2 stack layers above the
> > > underlying UNSAFE method call.
> > >
> > >
> > >
> > > For set methods, we still need to set the validity bit. However, this
> is
> > > through an unsafe method that directly sets the bits without checking:
> > >
> > >
> > >
> > >          public void set(int index, double value) {
> > >
> > >       UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer, index);
> > >
> > > PlatformDependent.putLong(
> > >
> > >             valueBuffer.memoryAddress() + (index << TYPE_LOG2_WIDTH),
> > > Double.doubleToRawLongBits(value));
> > >
> > > }
> > >
> > >
> > >
> > > Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe version
> of
> > > BitVectorHelper.setValidityBitToOne that avoids checks.
> > >
> > >
> > > Test Cases
> > >
> > > We can reuse existing test cases by employing parameterized test
> classes
> > > to test both safe and unsafe vectors.
> > > Current Progress
> > >
> > > We have opened a JIRA for this work item FlINK-5200
> > > <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
> > > <https://github.com/apache/arrow/pull/4212> with initial
> implementations
> > > have been opened. We would appreciate if you could give some comments.
> > >
>

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Wes McKinney <we...@gmail.com>.

I'm also curious which APIs are particularly problematic for
performance. In ARROW-1833 [1] and some related discussions there was
the suggestion of adding methods like getUnsafe, so this would be like
get(i) [2] but without checking the validity bitmap

[1] : https://issues.apache.org/jira/browse/ARROW-1833
[2]: https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/Float8Vector.java#L99

On Mon, Apr 29, 2019 at 1:05 PM Micah Kornfield <em...@gmail.com> wrote:
>
> Thanks for the design.   Personally, I'm not a huge fan of creating a
> parallel classes for every vector type, this ends up being confusing for
> developers and adds a lot of boiler plate.  I wonder if you could use a
> similar approach that the memory module uses for turning bounds checking
> on/off [1].
>
> Also, I think there was a comment on the JIRA, but are there any benchmarks
> to show the expected improvements?  My limited understanding is that for
> small methods the JVM's JIT should inline them anyways [2] , so it is not
> clear how much this will improve performance.
>
>
> Thanks,
> Micah
>
>
> [1]
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
> [2]
> https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f
>
> On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <li...@gmail.com> wrote:
>
> > Hi all,
> >
> > We are proposing a new set of APIs in Arrow - unsafe vector APIs. The
> > general ideas is attached below, and also accessible from our online
> > document
> > <https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing>.
> > Please give your valuable comments by directly commenting in our online
> > document
> > <https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing>,
> > or relaying this email thread.
> >
> > Thank you so much in advance.
> >
> > Best,
> > Liya Fan
> >
> > Support Fast/Unsafe Vector APIs for Arrow Background
> >
> > In our effort to support columnar data format in Apache Flink, we chose
> > Apache Arrow as the basic data structure. Arrow greatly simplifies the
> > support of the columnar data format. However, for many scenarios, we find
> > the performance unacceptable. Our investigation shows the reason is that,
> > there are too many redundant checks and computations in current Arrow API.
> >
> >
> >
> > For example, the following figures shows that in a single call to
> > Float8Vector.get(int) method (this is one of the most frequently used APIs
> > in Flink computation),  there are 20+ method invocations.
> >
> >
> > [image: image.png]
> >
> >
> >
> >
> >
> > There are many other APIs with similar problems. The redundant checks and
> > computations impact performance severely. According to our evaluation, the
> > performance may degrade by two or three orders of magnitude.
> > Our Proposal
> >
> > For many scenarios, the checks can be avoided, if the application
> > developers can guarantee that all checks will pass. So our proposal is to
> > provide some light-weight APIs. The APIs are also named *unsafe APIs*, in
> > the sense that that skip most of the checks (not safe) to improve the
> > performance.
> >
> >
> >
> > In the light-weight APIs, we only provide minimum checks, or avoid checks
> > at all. The application owner can still develop and debug their code using
> > the original safe APIs. Once all bugs have been fixed, they can switch to
> > unsafe APIs in the final version of their products and enjoy the high
> > performance.
> > Our Design
> >
> > Our goal is to include unsafe vector APIs in Arrow code base, and allow
> > our customers switching to the new unsafe APIs, without being aware of it,
> > except for the high performance. To achieve this goal, we make the
> > following design choices:
> > Vector Class Hierarchy
> >
> > Each unsafe vector is the subclass of the safe vector. For example, the
> > unsafe Float8Vector is a subclass of org.apache.arrow.vector.Float8Vector:
> >
> >
> >
> > package org.apache.arrow.vector.unsafe;
> >
> >
> >
> > public class Float8Vector extends org.apache.arrow.vector.Float8Vector
> >
> >
> >
> > So the safe vector acts as a façade of the unsafe vector, and through
> > polymorphism, the users may not be aware of which type of vector he/she is
> > working with. In addition, the common logics can be reused in the unsafe
> > vectors, and we only need to override get/set related methods.
> > Vector Creation
> >
> > We use factory methods to create each type of vectors. Compared with
> > vector constructors, the factory methods take one more parameter, the
> > vectorType:
> >
> >
> >
> > public class VectorFactory {
> >
> >   public static Float8Vector createFloat8Vector(VectorType vectorType,
> > String name, BufferAllocator allocator);
> >
> > }
> >
> >
> >
> > VectorType is an enum to separate safe vectors from unsafe ones:
> >
> >
> >
> > public enum VectorType {
> >
> >   SAFE,
> >
> >   UNSAFE
> >
> > }
> >
> >
> >
> > With the factory methods, the old way of creating vectors by constructors
> > can be gradually depreciated.
> > Vector Implementation
> >
> > As discussed above, unsafe vectors mainly override get/set methods. For
> > get methods, we directly operate on the off-heap memory, without any check:
> >
> >
> >
> > public double get(int index) {
> >
> >     return
> > Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
> > + (index << TYPE_LOG2_WIDTH)));
> >
> > }
> >
> >
> >
> > Note that the PlatformDependent API is only 2 stack layers above the
> > underlying UNSAFE method call.
> >
> >
> >
> > For set methods, we still need to set the validity bit. However, this is
> > through an unsafe method that directly sets the bits without checking:
> >
> >
> >
> >          public void set(int index, double value) {
> >
> >       UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer, index);
> >
> > PlatformDependent.putLong(
> >
> >             valueBuffer.memoryAddress() + (index << TYPE_LOG2_WIDTH),
> > Double.doubleToRawLongBits(value));
> >
> > }
> >
> >
> >
> > Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe version of
> > BitVectorHelper.setValidityBitToOne that avoids checks.
> >
> >
> > Test Cases
> >
> > We can reuse existing test cases by employing parameterized test classes
> > to test both safe and unsafe vectors.
> > Current Progress
> >
> > We have opened a JIRA for this work item FlINK-5200
> > <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
> > <https://github.com/apache/arrow/pull/4212> with initial implementations
> > have been opened. We would appreciate if you could give some comments.
> >

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

Posted by Micah Kornfield <em...@gmail.com>.

Thanks for the design.   Personally, I'm not a huge fan of creating a
parallel classes for every vector type, this ends up being confusing for
developers and adds a lot of boiler plate.  I wonder if you could use a
similar approach that the memory module uses for turning bounds checking
on/off [1].

Also, I think there was a comment on the JIRA, but are there any benchmarks
to show the expected improvements?  My limited understanding is that for
small methods the JVM's JIT should inline them anyways [2] , so it is not
clear how much this will improve performance.


Thanks,
Micah


[1]
https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java
[2]
https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f

On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <li...@gmail.com> wrote:

> Hi all,
>
> We are proposing a new set of APIs in Arrow - unsafe vector APIs. The
> general ideas is attached below, and also accessible from our online
> document
> <https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing>.
> Please give your valuable comments by directly commenting in our online
> document
> <https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing>,
> or relaying this email thread.
>
> Thank you so much in advance.
>
> Best,
> Liya Fan
>
> Support Fast/Unsafe Vector APIs for Arrow Background
>
> In our effort to support columnar data format in Apache Flink, we chose
> Apache Arrow as the basic data structure. Arrow greatly simplifies the
> support of the columnar data format. However, for many scenarios, we find
> the performance unacceptable. Our investigation shows the reason is that,
> there are too many redundant checks and computations in current Arrow API.
>
>
>
> For example, the following figures shows that in a single call to
> Float8Vector.get(int) method (this is one of the most frequently used APIs
> in Flink computation),  there are 20+ method invocations.
>
>
> [image: image.png]
>
>
>
>
>
> There are many other APIs with similar problems. The redundant checks and
> computations impact performance severely. According to our evaluation, the
> performance may degrade by two or three orders of magnitude.
> Our Proposal
>
> For many scenarios, the checks can be avoided, if the application
> developers can guarantee that all checks will pass. So our proposal is to
> provide some light-weight APIs. The APIs are also named *unsafe APIs*, in
> the sense that that skip most of the checks (not safe) to improve the
> performance.
>
>
>
> In the light-weight APIs, we only provide minimum checks, or avoid checks
> at all. The application owner can still develop and debug their code using
> the original safe APIs. Once all bugs have been fixed, they can switch to
> unsafe APIs in the final version of their products and enjoy the high
> performance.
> Our Design
>
> Our goal is to include unsafe vector APIs in Arrow code base, and allow
> our customers switching to the new unsafe APIs, without being aware of it,
> except for the high performance. To achieve this goal, we make the
> following design choices:
> Vector Class Hierarchy
>
> Each unsafe vector is the subclass of the safe vector. For example, the
> unsafe Float8Vector is a subclass of org.apache.arrow.vector.Float8Vector:
>
>
>
> package org.apache.arrow.vector.unsafe;
>
>
>
> public class Float8Vector extends org.apache.arrow.vector.Float8Vector
>
>
>
> So the safe vector acts as a façade of the unsafe vector, and through
> polymorphism, the users may not be aware of which type of vector he/she is
> working with. In addition, the common logics can be reused in the unsafe
> vectors, and we only need to override get/set related methods.
> Vector Creation
>
> We use factory methods to create each type of vectors. Compared with
> vector constructors, the factory methods take one more parameter, the
> vectorType:
>
>
>
> public class VectorFactory {
>
>   public static Float8Vector createFloat8Vector(VectorType vectorType,
> String name, BufferAllocator allocator);
>
> }
>
>
>
> VectorType is an enum to separate safe vectors from unsafe ones:
>
>
>
> public enum VectorType {
>
>   SAFE,
>
>   UNSAFE
>
> }
>
>
>
> With the factory methods, the old way of creating vectors by constructors
> can be gradually depreciated.
> Vector Implementation
>
> As discussed above, unsafe vectors mainly override get/set methods. For
> get methods, we directly operate on the off-heap memory, without any check:
>
>
>
> public double get(int index) {
>
>     return
> Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress()
> + (index << TYPE_LOG2_WIDTH)));
>
> }
>
>
>
> Note that the PlatformDependent API is only 2 stack layers above the
> underlying UNSAFE method call.
>
>
>
> For set methods, we still need to set the validity bit. However, this is
> through an unsafe method that directly sets the bits without checking:
>
>
>
>          public void set(int index, double value) {
>
>       UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer, index);
>
> PlatformDependent.putLong(
>
>             valueBuffer.memoryAddress() + (index << TYPE_LOG2_WIDTH),
> Double.doubleToRawLongBits(value));
>
> }
>
>
>
> Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe version of
> BitVectorHelper.setValidityBitToOne that avoids checks.
>
>
> Test Cases
>
> We can reuse existing test cases by employing parameterized test classes
> to test both safe and unsafe vectors.
> Current Progress
>
> We have opened a JIRA for this work item FlINK-5200
> <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR
> <https://github.com/apache/arrow/pull/4212> with initial implementations
> have been opened. We would appreciate if you could give some comments.
>