You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Ludovic Henry <lu...@microsoft.com.INVALID> on 2020/12/15 14:29:33 UTC

Usage of JDK Vector API in ML/MLLib

Hello,

I’ve, over the past few days, looked into using the new Vector API [1] to accelerate some BLAS operations straight from Java. You can find a gist at [2] containing most of the changes in mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala.

To measure performance, I’ve added a BLASBenchmark.scala [3] at mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala. I do see some promising speedups, especially compared to F2jBLAS. I’ve unfortunately not been able to install OpenBLAS locally and compare performance to native, but I would still expect native to be faster, especially on large inputs. See [4] for some f2j vs vector performance comparison.

The primary blocker is that the Vector API is only available in incubator mode, starting with JDK 16. We can have an easy run-time check whether we can use the Vectorized BLAS. But, to compile the Vectorized BLAS class, we need JDK 16+. Spark 3.0+ does compile with JDK 16 (it works locally), but I don’t know how to selectively compile sources based on the JDK version used at compile-time.

But much more importantly, I want to get your feedback before I keep exploring this idea further. Technically, it is feasible, and we’ll observe speed up whenever the native BLAS is not installed. Moreover, I am solely focusing on ML/MLLib for now. However, there is still graphx (I haven’t checked if there is anything vectorizable) and even supporting more explicit use of the Vector API in catalyst, which is a much bigger project.

Thank you,

Ludovic Henry

[1] https://openjdk.java.net/jeps/338

[2] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala

[3] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala

[4] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log

RE: Usage of JDK Vector API in ML/MLLib

Posted by Ludovic Henry <lu...@microsoft.com.INVALID>.

I’ve submitted https://github.com/apache/spark/pull/30810. It does use a profile to selectively compile VectorizedBLAS, and it uses reflection to try loading it at run-time and falls back to F2jBLAS.

I’m running right now the benchmarks on my x86 machine and I’ll post them on the PR.

From: Sean Owen<ma...@gmail.com>
Sent: Wednesday, 16 December 2020 14:23
To: Ludovic Henry<ma...@microsoft.com>
Cc: Erik Krogen<ma...@apache.org>; dev@spark.apache.org<ma...@spark.apache.org>; Bernhard Urban-Forster<ma...@microsoft.com>
Subject: Re: Usage of JDK Vector API in ML/MLLib

It's fine to prototype it. Because users can also get BLAS support by enabling a profile already, I think it bears understanding if perf is at least comparable before adding it as another option.
Or it could simply be an extra module / library until that time if it's desirable to release.
This may be a nice testing ground to see how much the API can substitute in for BLAS operations.

On Wed, Dec 16, 2020 at 4:41 AM Ludovic Henry <lu...@microsoft.com>> wrote:
Hi,

Thank you for the feedback. I’ll work on the profile-based approach to selectively compile this VectorBLAS class in. As for the run-time, I haven’t used specifically a reflection-based approach but a more simple `try { new VectorBLAS() } catch (NoClassDefFoundError) { new F2jBLAS() }`. I’ll submit a PR against gitHub.com/apache/spark with this change. Should I also fill up a bug inside the Jira as well?

On a side note, I worked yesterday on extracting this code into a standalone project [1]. It’s not so much so that Spark can depend on that (even though it could be possible), but it is to make it easier to develop, test, and benchmark new implementations on my end.

Thank you,
Ludovic

[1] https://github.com/luhenry/blas<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fluhenry%2Fblas&data=04%7C01%7Cluhenry%40microsoft.com%7C7ee9550454aa4cf5442c08d8a1c5b4df%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637437217832390943%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hbmUTTtdcT%2Ba2j9DnpFjGXvHwDOxkM1UN%2B5D1LyMNew%3D&reserved=0>

From: Erik Krogen<ma...@apache.org>
Sent: Tuesday, 15 December 2020 17:33
To: Sean Owen<ma...@gmail.com>
Cc: Ludovic Henry<ma...@microsoft.com>; dev@spark.apache.org<ma...@spark.apache.org>; Bernhard Urban-Forster<ma...@microsoft.com>
Subject: Re: Usage of JDK Vector API in ML/MLLib

Regarding selective compilation, you can hide sources behind a Maven profile such as `-Pvectorized`. Check out what we do to switch between the `hive-1.2` and `hive-2.3` profiles where different source directories are grabbed at compile-time (the hive-1.2 profile was recently removed so you might have to go back a little in git history). This won't do it automatically based on JDK version, but it's probably good enough. At runtime you can more easily do a JDK version check -- I agree with Sean on loading via reflection.

Personally, I see no reason not to start adding this support in preparation for broader adoption of JDK 16, provided that it is properly protected behind flags. This could be a big win for installations which haven't gone through the process of installing native BLAS libs.

On Tue, Dec 15, 2020 at 7:10 AM Sean Owen <sr...@gmail.com>> wrote:
Yes it's intriguing, though as you say not readily available in the wild yet.
I would also expect native BLAS to outperform f2j also, so yeah that's the interesting question, whether this is a win over native code or not.
I suppose the upside is eventually, we may expect this API to be available in all JVMs, not just those with native libraries added at runtime.

I wonder if a short-term goal would be to ensure that these calls are simply abstracted away, which they should already me, so it's easy to plug in this new 'BLAS' implementation. I'm sure it's possible to load this selectively via reflection, as that's what the current libraries do.
And there may be additional code paths that could benefit from these operations that don't already.

On Tue, Dec 15, 2020 at 8:30 AM Ludovic Henry <lu...@microsoft.com.invalid> wrote:

Hello,

I’ve, over the past few days, looked into using the new Vector API [1] to accelerate some BLAS operations straight from Java. You can find a gist at [2] containing most of the changes in mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala.

To measure performance, I’ve added a BLASBenchmark.scala [3] at mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala. I do see some promising speedups, especially compared to F2jBLAS. I’ve unfortunately not been able to install OpenBLAS locally and compare performance to native, but I would still expect native to be faster, especially on large inputs. See [4] for some f2j vs vector performance comparison.

The primary blocker is that the Vector API is only available in incubator mode, starting with JDK 16. We can have an easy run-time check whether we can use the Vectorized BLAS. But, to compile the Vectorized BLAS class, we need JDK 16+. Spark 3.0+ does compile with JDK 16 (it works locally), but I don’t know how to selectively compile sources based on the JDK version used at compile-time.

But much more importantly, I want to get your feedback before I keep exploring this idea further. Technically, it is feasible, and we’ll observe speed up whenever the native BLAS is not installed. Moreover, I am solely focusing on ML/MLLib for now. However, there is still graphx (I haven’t checked if there is anything vectorizable) and even supporting more explicit use of the Vector API in catalyst, which is a much bigger project.

Thank you,

Ludovic Henry

[1] https://openjdk.java.net/jeps/338<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F338&data=04%7C01%7Cluhenry%40microsoft.com%7C7ee9550454aa4cf5442c08d8a1c5b4df%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637437217832400934%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8gMgk6MQ5jA%2FA%2FNa0xq5IHPl818CO7SBI%2FAXk9J5HRg%3D&reserved=0>

[2] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blas-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C7ee9550454aa4cf5442c08d8a1c5b4df%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637437217832400934%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0RKebDaAYGJkKDNnEfgnTRO4SWjqKoVFrprJztE1g6E%3D&reserved=0>

[3] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blasbenchmark-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C7ee9550454aa4cf5442c08d8a1c5b4df%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637437217832410929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SbKLA4J59f2nta%2F3C7zFyJAxxSq1o8l13lMQQq9jvZM%3D&reserved=0>

[4] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-f2j-vs-vector-log&data=04%7C01%7Cluhenry%40microsoft.com%7C7ee9550454aa4cf5442c08d8a1c5b4df%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637437217832410929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TV3tyaygYuXIaXJNvwjVBbO%2FVFuOoewmIDuF0l2ar74%3D&reserved=0>

Re: Usage of JDK Vector API in ML/MLLib

Posted by Sean Owen <sr...@gmail.com>.

It's fine to prototype it. Because users can also get BLAS support by
enabling a profile already, I think it bears understanding if perf is at
least comparable before adding it as another option.
Or it could simply be an extra module / library until that time if it's
desirable to release.
This may be a nice testing ground to see how much the API can substitute in
for BLAS operations.

On Wed, Dec 16, 2020 at 4:41 AM Ludovic Henry <lu...@microsoft.com> wrote:

> Hi,
>
>
>
> Thank you for the feedback. I’ll work on the profile-based approach to
> selectively compile this VectorBLAS class in. As for the run-time, I
> haven’t used specifically a reflection-based approach but a more simple
> `try { new VectorBLAS() } catch (NoClassDefFoundError) { new F2jBLAS() }`.
> I’ll submit a PR against gitHub.com/apache/spark with this change. Should I
> also fill up a bug inside the Jira as well?
>
>
>
> On a side note, I worked yesterday on extracting this code into a
> standalone project [1]. It’s not so much so that Spark can depend on that
> (even though it could be possible), but it is to make it easier to develop,
> test, and benchmark new implementations on my end.
>
>
>
> Thank you,
>
> Ludovic
>
>
>
> [1] https://github.com/luhenry/blas
>
>
>
> *From: *Erik Krogen <xk...@apache.org>
> *Sent: *Tuesday, 15 December 2020 17:33
> *To: *Sean Owen <sr...@gmail.com>
> *Cc: *Ludovic Henry <lu...@microsoft.com>; dev@spark.apache.org; Bernhard
> Urban-Forster <be...@microsoft.com>
> *Subject: *Re: Usage of JDK Vector API in ML/MLLib
>
>
>
> Regarding selective compilation, you can hide sources behind a Maven
> profile such as `-Pvectorized`. Check out what we do to switch between the
> `hive-1.2` and `hive-2.3` profiles where different source directories are
> grabbed at compile-time (the hive-1.2 profile was recently removed so you
> might have to go back a little in git history). This won't do it
> automatically based on JDK version, but it's probably good enough. At
> runtime you can more easily do a JDK version check -- I agree with Sean on
> loading via reflection.
>
>
>
> Personally, I see no reason not to start adding this support in
> preparation for broader adoption of JDK 16, provided that it is properly
> protected behind flags. This could be a big win for installations which
> haven't gone through the process of installing native BLAS libs.
>
>
>
> On Tue, Dec 15, 2020 at 7:10 AM Sean Owen <sr...@gmail.com> wrote:
>
> Yes it's intriguing, though as you say not readily available in the wild
> yet.
>
> I would also expect native BLAS to outperform f2j also, so yeah that's the
> interesting question, whether this is a win over native code or not.
>
> I suppose the upside is eventually, we may expect this API to be available
> in all JVMs, not just those with native libraries added at runtime.
>
>
>
> I wonder if a short-term goal would be to ensure that these calls are
> simply abstracted away, which they should already me, so it's easy to plug
> in this new 'BLAS' implementation. I'm sure it's possible to load this
> selectively via reflection, as that's what the current libraries do.
>
> And there may be additional code paths that could benefit from these
> operations that don't already.
>
>
>
> On Tue, Dec 15, 2020 at 8:30 AM Ludovic Henry
> <lu...@microsoft.com.invalid> wrote:
>
> Hello,
>
>
>
> I’ve, over the past few days, looked into using the new Vector API [1] to
> accelerate some BLAS operations straight from Java. You can find a gist at
> [2] containing most of the changes in
> mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala.
>
>
>
> To measure performance, I’ve added a BLASBenchmark.scala [3] at
> mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala.
> I do see some promising speedups, especially compared to F2jBLAS. I’ve
> unfortunately not been able to install OpenBLAS locally and compare
> performance to native, but I would still expect native to be faster,
> especially on large inputs. See [4] for some f2j vs vector performance
> comparison.
>
>
>
> The primary blocker is that the Vector API is only available in incubator
> mode, starting with JDK 16. We can have an easy run-time check whether we
> can use the Vectorized BLAS. But, to compile the Vectorized BLAS class, we
> need JDK 16+. Spark 3.0+ does compile with JDK 16 (it works locally), but I
> don’t know how to selectively compile sources based on the JDK version used
> at compile-time.
>
>
>
> But much more importantly, I want to get your feedback before I keep
> exploring this idea further. Technically, it is feasible, and we’ll observe
> speed up whenever the native BLAS is not installed. Moreover, I am solely
> focusing on ML/MLLib for now. However, there is still graphx (I haven’t
> checked if there is anything vectorizable) and even supporting more
> explicit use of the Vector API in catalyst, which is a much bigger project.
>
>
>
> Thank you,
>
> Ludovic Henry
>
>
>
> [1] https://openjdk.java.net/jeps/338
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F338&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156914676%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QpoFg2EPrkCsbFHGUvK26opwpbVruQOwCde70o%2FE50s%3D&reserved=0>
>
> [2]
> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blas-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156924670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=M%2Bir7vVGDxDamrXvwvrtqzhOEQ6TD7oJT3sf5fJ1Ovk%3D&reserved=0>
>
> [3]
> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blasbenchmark-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156934671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2PRGL%2FeVB4QMGwpNyebTAKttjESnhek5LDSQuYRYawM%3D&reserved=0>
>
> [4]
> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-f2j-vs-vector-log&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156934671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4FA7p18jd6yVnIvRGNNeDWA5%2F%2Fw249z6%2B%2BOuJhRnTBI%3D&reserved=0>
>
>
>

RE: Usage of JDK Vector API in ML/MLLib

Posted by Ludovic Henry <lu...@microsoft.com.INVALID>.

Hi,

Thank you for the feedback. I’ll work on the profile-based approach to selectively compile this VectorBLAS class in. As for the run-time, I haven’t used specifically a reflection-based approach but a more simple `try { new VectorBLAS() } catch (NoClassDefFoundError) { new F2jBLAS() }`. I’ll submit a PR against gitHub.com/apache/spark with this change. Should I also fill up a bug inside the Jira as well?

On a side note, I worked yesterday on extracting this code into a standalone project [1]. It’s not so much so that Spark can depend on that (even though it could be possible), but it is to make it easier to develop, test, and benchmark new implementations on my end.

Thank you,
Ludovic

[1] https://github.com/luhenry/blas

From: Erik Krogen<ma...@apache.org>
Sent: Tuesday, 15 December 2020 17:33
To: Sean Owen<ma...@gmail.com>
Cc: Ludovic Henry<ma...@microsoft.com>; dev@spark.apache.org<ma...@spark.apache.org>; Bernhard Urban-Forster<ma...@microsoft.com>
Subject: Re: Usage of JDK Vector API in ML/MLLib

Regarding selective compilation, you can hide sources behind a Maven profile such as `-Pvectorized`. Check out what we do to switch between the `hive-1.2` and `hive-2.3` profiles where different source directories are grabbed at compile-time (the hive-1.2 profile was recently removed so you might have to go back a little in git history). This won't do it automatically based on JDK version, but it's probably good enough. At runtime you can more easily do a JDK version check -- I agree with Sean on loading via reflection.

Personally, I see no reason not to start adding this support in preparation for broader adoption of JDK 16, provided that it is properly protected behind flags. This could be a big win for installations which haven't gone through the process of installing native BLAS libs.

On Tue, Dec 15, 2020 at 7:10 AM Sean Owen <sr...@gmail.com>> wrote:
Yes it's intriguing, though as you say not readily available in the wild yet.
I would also expect native BLAS to outperform f2j also, so yeah that's the interesting question, whether this is a win over native code or not.
I suppose the upside is eventually, we may expect this API to be available in all JVMs, not just those with native libraries added at runtime.

I wonder if a short-term goal would be to ensure that these calls are simply abstracted away, which they should already me, so it's easy to plug in this new 'BLAS' implementation. I'm sure it's possible to load this selectively via reflection, as that's what the current libraries do.
And there may be additional code paths that could benefit from these operations that don't already.

On Tue, Dec 15, 2020 at 8:30 AM Ludovic Henry <lu...@microsoft.com.invalid> wrote:

Hello,



I’ve, over the past few days, looked into using the new Vector API [1] to accelerate some BLAS operations straight from Java. You can find a gist at [2] containing most of the changes in mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala.



To measure performance, I’ve added a BLASBenchmark.scala [3] at mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala. I do see some promising speedups, especially compared to F2jBLAS. I’ve unfortunately not been able to install OpenBLAS locally and compare performance to native, but I would still expect native to be faster, especially on large inputs. See [4] for some f2j vs vector performance comparison.



The primary blocker is that the Vector API is only available in incubator mode, starting with JDK 16. We can have an easy run-time check whether we can use the Vectorized BLAS. But, to compile the Vectorized BLAS class, we need JDK 16+. Spark 3.0+ does compile with JDK 16 (it works locally), but I don’t know how to selectively compile sources based on the JDK version used at compile-time.



But much more importantly, I want to get your feedback before I keep exploring this idea further. Technically, it is feasible, and we’ll observe speed up whenever the native BLAS is not installed. Moreover, I am solely focusing on ML/MLLib for now. However, there is still graphx (I haven’t checked if there is anything vectorizable) and even supporting more explicit use of the Vector API in catalyst, which is a much bigger project.



Thank you,

Ludovic Henry



[1] https://openjdk.java.net/jeps/338<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F338&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156914676%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QpoFg2EPrkCsbFHGUvK26opwpbVruQOwCde70o%2FE50s%3D&reserved=0>

[2] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blas-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156924670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=M%2Bir7vVGDxDamrXvwvrtqzhOEQ6TD7oJT3sf5fJ1Ovk%3D&reserved=0>

[3] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blasbenchmark-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156934671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2PRGL%2FeVB4QMGwpNyebTAKttjESnhek5LDSQuYRYawM%3D&reserved=0>

[4] https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-f2j-vs-vector-log&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156934671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4FA7p18jd6yVnIvRGNNeDWA5%2F%2Fw249z6%2B%2BOuJhRnTBI%3D&reserved=0>

Re: Usage of JDK Vector API in ML/MLLib

Posted by Erik Krogen <xk...@apache.org>.

Regarding selective compilation, you can hide sources behind a Maven
profile such as `-Pvectorized`. Check out what we do to switch between the
`hive-1.2` and `hive-2.3` profiles where different source directories are
grabbed at compile-time (the hive-1.2 profile was recently removed so you
might have to go back a little in git history). This won't do it
automatically based on JDK version, but it's probably good enough. At
runtime you can more easily do a JDK version check -- I agree with Sean on
loading via reflection.

Personally, I see no reason not to start adding this support in preparation
for broader adoption of JDK 16, provided that it is properly protected
behind flags. This could be a big win for installations which haven't gone
through the process of installing native BLAS libs.

On Tue, Dec 15, 2020 at 7:10 AM Sean Owen <sr...@gmail.com> wrote:

> Yes it's intriguing, though as you say not readily available in the wild
> yet.
> I would also expect native BLAS to outperform f2j also, so yeah that's the
> interesting question, whether this is a win over native code or not.
> I suppose the upside is eventually, we may expect this API to be available
> in all JVMs, not just those with native libraries added at runtime.
>
> I wonder if a short-term goal would be to ensure that these calls are
> simply abstracted away, which they should already me, so it's easy to plug
> in this new 'BLAS' implementation. I'm sure it's possible to load this
> selectively via reflection, as that's what the current libraries do.
> And there may be additional code paths that could benefit from these
> operations that don't already.
>
> On Tue, Dec 15, 2020 at 8:30 AM Ludovic Henry
> <lu...@microsoft.com.invalid> wrote:
>
>> Hello,
>>
>>
>>
>> I’ve, over the past few days, looked into using the new Vector API [1] to
>> accelerate some BLAS operations straight from Java. You can find a gist at
>> [2] containing most of the changes in
>> mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala.
>>
>>
>>
>> To measure performance, I’ve added a BLASBenchmark.scala [3] at
>> mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala.
>> I do see some promising speedups, especially compared to F2jBLAS. I’ve
>> unfortunately not been able to install OpenBLAS locally and compare
>> performance to native, but I would still expect native to be faster,
>> especially on large inputs. See [4] for some f2j vs vector performance
>> comparison.
>>
>>
>>
>> The primary blocker is that the Vector API is only available in incubator
>> mode, starting with JDK 16. We can have an easy run-time check whether we
>> can use the Vectorized BLAS. But, to compile the Vectorized BLAS class, we
>> need JDK 16+. Spark 3.0+ does compile with JDK 16 (it works locally), but I
>> don’t know how to selectively compile sources based on the JDK version used
>> at compile-time.
>>
>>
>>
>> But much more importantly, I want to get your feedback before I keep
>> exploring this idea further. Technically, it is feasible, and we’ll observe
>> speed up whenever the native BLAS is not installed. Moreover, I am solely
>> focusing on ML/MLLib for now. However, there is still graphx (I haven’t
>> checked if there is anything vectorizable) and even supporting more
>> explicit use of the Vector API in catalyst, which is a much bigger project.
>>
>>
>>
>> Thank you,
>>
>> Ludovic Henry
>>
>>
>>
>> [1] https://openjdk.java.net/jeps/338
>>
>> [2]
>> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala
>>
>> [3]
>> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala
>>
>> [4]
>> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log
>>
>

Re: Usage of JDK Vector API in ML/MLLib

Posted by Sean Owen <sr...@gmail.com>.

Yes it's intriguing, though as you say not readily available in the wild
yet.
I would also expect native BLAS to outperform f2j also, so yeah that's the
interesting question, whether this is a win over native code or not.
I suppose the upside is eventually, we may expect this API to be available
in all JVMs, not just those with native libraries added at runtime.

I wonder if a short-term goal would be to ensure that these calls are
simply abstracted away, which they should already me, so it's easy to plug
in this new 'BLAS' implementation. I'm sure it's possible to load this
selectively via reflection, as that's what the current libraries do.
And there may be additional code paths that could benefit from these
operations that don't already.

On Tue, Dec 15, 2020 at 8:30 AM Ludovic Henry <lu...@microsoft.com.invalid>
wrote:

> Hello,
>
>
>
> I’ve, over the past few days, looked into using the new Vector API [1] to
> accelerate some BLAS operations straight from Java. You can find a gist at
> [2] containing most of the changes in
> mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala.
>
>
>
> To measure performance, I’ve added a BLASBenchmark.scala [3] at
> mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala.
> I do see some promising speedups, especially compared to F2jBLAS. I’ve
> unfortunately not been able to install OpenBLAS locally and compare
> performance to native, but I would still expect native to be faster,
> especially on large inputs. See [4] for some f2j vs vector performance
> comparison.
>
>
>
> The primary blocker is that the Vector API is only available in incubator
> mode, starting with JDK 16. We can have an easy run-time check whether we
> can use the Vectorized BLAS. But, to compile the Vectorized BLAS class, we
> need JDK 16+. Spark 3.0+ does compile with JDK 16 (it works locally), but I
> don’t know how to selectively compile sources based on the JDK version used
> at compile-time.
>
>
>
> But much more importantly, I want to get your feedback before I keep
> exploring this idea further. Technically, it is feasible, and we’ll observe
> speed up whenever the native BLAS is not installed. Moreover, I am solely
> focusing on ML/MLLib for now. However, there is still graphx (I haven’t
> checked if there is anything vectorizable) and even supporting more
> explicit use of the Vector API in catalyst, which is a much bigger project.
>
>
>
> Thank you,
>
> Ludovic Henry
>
>
>
> [1] https://openjdk.java.net/jeps/338
>
> [2]
> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala
>
> [3]
> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala
>
> [4]
> https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log
>