You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Piotr Balcer (Jira)" <ji...@apache.org> on 2019/11/05 10:05:00 UTC

[jira] [Comment Edited] (KUDU-2990) Kudu can't distribute libnuma (dependency of memkind)

    [ https://issues.apache.org/jira/browse/KUDU-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967391#comment-16967391 ] 

Piotr Balcer edited comment on KUDU-2990 at 11/5/19 10:04 AM:
--------------------------------------------------------------

I'm Michal's colleague from the Persistent Memory Development Kit (PMDK) team.

The vast majority of libraries shipped as part of PMDK, including memkind, have non-optional LGPL dependencies on Linux-based platforms. The reason for that is simple, the linux kernel convenience user-space interfaces themselves are sometimes distributed as LGPL libraries, as is the case with libnuma.

In this concrete case, libmemkind needs to call mbind(), which is necessary for any NUMA-based memory manipulation. The mbind() syscall interface is declared and implemented in libnuma, and without this dependency we would have to resort to raw syscalls and re-defining the kernel interfaces (see [https://github.com/numactl/numactl/blob/master/syscall.c]). It's unlikely that we would be persuaded to do so without a very important reason.

Also, notice that libnuma isn't some obscure library. All software that wishes to explicitly manage NUMA-awareness on multi-socket systems needs to have a libnuma dependency. Meaning that no ASF project that ships on Linux can do so without re-implementing Linux user-space interfaces.

The same problem occurs with our other software that relies on Linux kernel interfaces that ship in the form of an LGPL-licensed libraries. For example, the entire linux libnvdimm subsystem is controlled through LGPL-licensed libndctl ([https://github.com/pmem/ndctl]). This is a hard-dependency on Linux for most of PMDK's libraries. And obviously, there's glibc...

Having said all that, long-term, we could modularize libmemkind so that the various dependencies are optional, and functionality ships as separate plugins, e.g., to allocate persistent memory, you'd need base libmemkind and a libmemkind-pmem plugin. This would solve the licensing issue *in this particular case*, but it's just an idea right now and it would take considerable amount of time to implement.


was (Author: pbalcer):
I'm Michal's colleague from the Persistent Memory Development Kit (PMDK) team.

The vast majority of libraries shipped as part of PMDK, including memkind, have non-optional LGPL dependencies on Linux-based platforms. The reason for that is simple, the linux kernel convenience user-space interfaces themselves are sometimes distributed as LGPL libraries, as is the case with libnuma.

In this concrete case, libmemkind needs to call mbind(), which is necessary for any NUMA-based memory manipulation. The mbind() syscall interface is declared and implemented in libnuma, and without this dependency we would have to resort to raw syscalls and re-defining the kernel interfaces (see [https://github.com/numactl/numactl/blob/master/syscall.c]). It's unlikely that we would be persuaded to do so without a very important reason.

Also, notice that libnuma isn't some obscure library. All software that wishes to be explicitly manage NUMA-awareness on multi-socket systems needs to have a libnuma dependency. Meaning that no ASF project that ships on Linux can do so without re-implementing Linux user-space interfaces.

The same problem occurs with our other software that relies on Linux kernel interfaces that ship in the form of an LGPL-licensed libraries. For example, the entire linux libnvdimm subsystem is controlled through LGPL-licensed libndctl ([https://github.com/pmem/ndctl]). This is a hard-dependency on Linux for most of PMDK's libraries. And obviously, there's glibc...

Having said all that, long-term, we could modularize libmemkind so that the various dependencies are optional, and functionality ships as separate plugins, e.g., to allocate persistent memory, you'd need base libmemkind and a libmemkind-pmem plugin. This would solve the licensing issue *in this particular case*, but it's just an idea right now and it would take considerable amount of time to implement.

> Kudu can't distribute libnuma (dependency of memkind)
> -----------------------------------------------------
>
>                 Key: KUDU-2990
>                 URL: https://issues.apache.org/jira/browse/KUDU-2990
>             Project: Kudu
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 1.10.0, 1.11.0, 1.12.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Blocker
>
> I noticed in [this commit|https://github.com/apache/kudu/commit/973e5cdf8fbcedcdcc659d980f3a3a69dc4f109f] that libnuma (a dependency of memkind) is licensed under the LGPL. This means that we can't distribute it as per the [ASF 3rd party license policy|https://www.apache.org/legal/resolved.html#category-x].
> Some background: memkind was added as a new thirdparty dependency in 1.10.0. It replaced the libraries provided by [PMDK|https://pmem.io/pmdk/], and is used to power our generic non-volatile memory cache implementation, which can be configured as a replacement for the standard DRAM-based block cache.
> I spent some time looking into whether our use of memkind actually calls into libnuma and unfortunately I think the answer is yes: when we map a pmem region via memkind, it creates an arena with which to do allocations, and that allocates some per-CPU data structures. The precise number of structures is derived from a call into libnuma.
> We'll need to find a creative solution to this problem. Some ideas:
> # Restrict libnuma to build time and expect it on the host system at runtime. We do this for some libraries already, like libsasl. I see libnuma installed on my laptop (Ubuntu 18) as well as on CentOS 6.6 and 7.3 machines we use for development. On my laptop the reverse dependencies look significant enough that it's likely installed by default, but I can't guarantee that everywhere, nor is it guaranteed for all sorts of funky container images users will no doubt put Kudu in.
> # Like #1 but also patch memkind to dlopen() libnuma so that if it can't be found, whatever memkind function is currently running returns an error. That's a much better failure mode than "the Kudu process can't start", but it's unclear how much work this would be.
> # Make the NVM cache implementation fully optional and excise it from the default Kudu distribution. I say "fully optional" because it's already somewhat optional: the CMake logic allows for it (and memkind, and libnuma) to not exist on macOS where that stuff apparently just doesn't work. Still, this would be frustrating for users who wish to use the NVM cache out of the box.
> I'm not sure what needs to happen to 1.10.0 (first release with the libnuma dependency) and with 1.11.0 (imminently releasing). Could someone with more experience in ASF legal matters weigh in?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)