You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datasketches.apache.org by Jon Malkin <jm...@apache.org> on 2019/10/29 23:57:04 UTC

Python packages

Hi again,

We currently have python wrappers for all of the sketches in C++, and since
they're very think wrappers around the C++ headers it seems excessive to
pull those classes into an entirely separate repo.

Right now, however, one must invoke cmake with a flag like -DWITH_PYTHON=1
(and have pybind11 available).

There's a setup.py script so one can install by giving the repo parth, but
we would ideally allow 'pip install datasketches' to just work (modulo the
cmake version). Is there a standard/common way of doing that? Just make
sure we have an explicit vote on the c++ repo with python and then manually
upload a pip package for to the pypi repo?

Guidance is welcome!

Thanks,
  jon

Re: Python packages

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
I'm not quite following the original question, but can speak to this generally.

The easiest for your users is to build and ship wheels statically
linking in your library for all platforms you want to support. In this
case the user does "pip install <pypy-package> and it will just work.
The downsides of statically linking things are, of course, redundancy
if you have to link it into several independent Python modules, and
the lack of shared state between them (and any other users of the
library). But this can be the best option for simple libraries.

After that, the next easiest is to build and ship wheels that
dynamically link your library, and assume (or instruct) the user has
it installed and available. For some distributions  (e.g. conda) you
can easily declare C/C++ libraries as dependencies that will get
installed when a user installs your package. If you go this route, be
very careful to note that C++ does not have a stable ABI and strange
things can happen if it doesn't line up exactly between your build
farm and the user's computer.

The next step, easier for you and (possibly) harder for your user, is
to just ship a sdist (basically, a tarball of your sources) file with
the .c/c++ files in the sdist, and a setup.py that will use
setuptools/distutils to invoke the compilation on the users machine.
Forcing the user to re-compile works around any possible ABI issues
(though API ones may remain).

Lastly, you could require your users to do some custom step using some
third-party build system like cmake to get the bindings. Expect a huge
dropoff in Python users if you go this route.

I wouldn't, in general, expect a separate repo for the bindings alone
(assuming they're maintained by the same group maintaining the
library).

On Wed, Oct 30, 2019 at 9:44 PM Kenneth Knowles <ke...@apache.org> wrote:
>
> I don't think general@incubator a good venue for this topic, actually. It is a large audience, yes, but this is a technical question more suited to communities blending Python and C++.
>
> +Robert Bradshaw who is someone I trust to have some good insights into building combined Python/C++/Cython libraries.
>
> I expect there are some cultural expectations around whether a pypi package includes a variety of native libs, or whether it assumes the native lib has been installed on a machine by the users. In either case, my personal opinion is that a separate repo is probably excessive.
>
> Kenn
>
> On Wed, Oct 30, 2019 at 8:36 AM Jon Malkin <jm...@apache.org> wrote:
>>
>> This is what happens when I write on my phone but don't remember to proofread every single word. Sorry. Will fix it and move to the general list. Using a real keyboard.
>>
>>   jon
>>
>> On Tue, Oct 29, 2019, 11:37 PM leerho <le...@gmail.com> wrote:
>>>
>>> Jon, read your message again.  I think you meant “thin wrapper” and “repo path”.  The last sentence doesn’t make sense to me.  What do you want to vote on?
>>>
>>> If this is hard for me to parse, ESL folks will find it much much harder.  I would suggest rewriting it and posting it on general where you are more likely to find someone with experience in Python/C++ bindings and pip install scripts.
>>>
>>> Cheers,
>>> Lee
>>>
>>> PS: ESL: English as a Second Language.
>>>
>>>
>>>
>>>
>>> On Tue, Oct 29, 2019 at 4:57 PM Jon Malkin <jm...@apache.org> wrote:
>>>>
>>>> Hi again,
>>>>
>>>> We currently have python wrappers for all of the sketches in C++, and since they're very think wrappers around the C++ headers it seems excessive to pull those classes into an entirely separate repo.
>>>>
>>>> Right now, however, one must invoke cmake with a flag like -DWITH_PYTHON=1 (and have pybind11 available).
>>>>
>>>> There's a setup.py script so one can install by giving the repo parth, but we would ideally allow 'pip install datasketches' to just work (modulo the cmake version). Is there a standard/common way of doing that? Just make sure we have an explicit vote on the c++ repo with python and then manually upload a pip package for to the pypi repo?
>>>>
>>>> Guidance is welcome!
>>>>
>>>> Thanks,
>>>>   jon
>>>>
>>>>
>>> --
>>> From my cell phone.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@datasketches.apache.org
For additional commands, e-mail: dev-help@datasketches.apache.org


Re: Python packages

Posted by Kenneth Knowles <ke...@apache.org>.
I don't think general@incubator a good venue for this topic, actually. It
is a large audience, yes, but this is a technical question more suited to
communities blending Python and C++.

+Robert Bradshaw <ro...@google.com> who is someone I trust to have some
good insights into building combined Python/C++/Cython libraries.

I expect there are some cultural expectations around whether a pypi package
includes a variety of native libs, or whether it assumes the native lib has
been installed on a machine by the users. In either case, my personal
opinion is that a separate repo is probably excessive.

Kenn

On Wed, Oct 30, 2019 at 8:36 AM Jon Malkin <jm...@apache.org> wrote:

> This is what happens when I write on my phone but don't remember to
> proofread every single word. Sorry. Will fix it and move to the general
> list. Using a real keyboard.
>
>   jon
>
> On Tue, Oct 29, 2019, 11:37 PM leerho <le...@gmail.com> wrote:
>
>> Jon, read your message again.  I think you meant “thin wrapper” and “repo
>> path”.  The last sentence doesn’t make sense to me.  What do you want to
>> vote on?
>>
>> If this is hard for me to parse, ESL folks will find it much much
>> harder.  I would suggest rewriting it and posting it on general where you
>> are more likely to find someone with experience in Python/C++ bindings and
>> pip install scripts.
>>
>> Cheers,
>> Lee
>>
>> PS: ESL: English as a Second Language.
>>
>>
>>
>>
>> On Tue, Oct 29, 2019 at 4:57 PM Jon Malkin <jm...@apache.org> wrote:
>>
>>> Hi again,
>>>
>>> We currently have python wrappers for all of the sketches in C++, and
>>> since they're very think wrappers around the C++ headers it seems excessive
>>> to pull those classes into an entirely separate repo.
>>>
>>> Right now, however, one must invoke cmake with a flag like
>>> -DWITH_PYTHON=1 (and have pybind11 available).
>>>
>>> There's a setup.py script so one can install by giving the repo parth,
>>> but we would ideally allow 'pip install datasketches' to just work (modulo
>>> the cmake version). Is there a standard/common way of doing that? Just make
>>> sure we have an explicit vote on the c++ repo with python and then manually
>>> upload a pip package for to the pypi repo?
>>>
>>> Guidance is welcome!
>>>
>>> Thanks,
>>>   jon
>>>
>>>
>>> --
>> From my cell phone.
>>
>

Re: Python packages

Posted by Jon Malkin <jm...@apache.org>.
This is what happens when I write on my phone but don't remember to
proofread every single word. Sorry. Will fix it and move to the general
list. Using a real keyboard.

  jon

On Tue, Oct 29, 2019, 11:37 PM leerho <le...@gmail.com> wrote:

> Jon, read your message again.  I think you meant “thin wrapper” and “repo
> path”.  The last sentence doesn’t make sense to me.  What do you want to
> vote on?
>
> If this is hard for me to parse, ESL folks will find it much much harder.
> I would suggest rewriting it and posting it on general where you are more
> likely to find someone with experience in Python/C++ bindings and pip
> install scripts.
>
> Cheers,
> Lee
>
> PS: ESL: English as a Second Language.
>
>
>
>
> On Tue, Oct 29, 2019 at 4:57 PM Jon Malkin <jm...@apache.org> wrote:
>
>> Hi again,
>>
>> We currently have python wrappers for all of the sketches in C++, and
>> since they're very think wrappers around the C++ headers it seems excessive
>> to pull those classes into an entirely separate repo.
>>
>> Right now, however, one must invoke cmake with a flag like
>> -DWITH_PYTHON=1 (and have pybind11 available).
>>
>> There's a setup.py script so one can install by giving the repo parth,
>> but we would ideally allow 'pip install datasketches' to just work (modulo
>> the cmake version). Is there a standard/common way of doing that? Just make
>> sure we have an explicit vote on the c++ repo with python and then manually
>> upload a pip package for to the pypi repo?
>>
>> Guidance is welcome!
>>
>> Thanks,
>>   jon
>>
>>
>> --
> From my cell phone.
>

Re: Python packages

Posted by leerho <le...@gmail.com>.
Jon, read your message again.  I think you meant “thin wrapper” and “repo
path”.  The last sentence doesn’t make sense to me.  What do you want to
vote on?

If this is hard for me to parse, ESL folks will find it much much harder.
I would suggest rewriting it and posting it on general where you are more
likely to find someone with experience in Python/C++ bindings and pip
install scripts.

Cheers,
Lee

PS: ESL: English as a Second Language.




On Tue, Oct 29, 2019 at 4:57 PM Jon Malkin <jm...@apache.org> wrote:

> Hi again,
>
> We currently have python wrappers for all of the sketches in C++, and
> since they're very think wrappers around the C++ headers it seems excessive
> to pull those classes into an entirely separate repo.
>
> Right now, however, one must invoke cmake with a flag like -DWITH_PYTHON=1
> (and have pybind11 available).
>
> There's a setup.py script so one can install by giving the repo parth, but
> we would ideally allow 'pip install datasketches' to just work (modulo the
> cmake version). Is there a standard/common way of doing that? Just make
> sure we have an explicit vote on the c++ repo with python and then manually
> upload a pip package for to the pypi repo?
>
> Guidance is welcome!
>
> Thanks,
>   jon
>
>
> --
From my cell phone.