You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datasketches.apache.org by Jon Malkin <jm...@apache.org> on 2020/01/29 09:00:53 UTC

Binary compatibility

Here's a question for the group.

I'm writing unit tests for the c++ version of varopt sampling. (Finally,
the main code is feature complete! But until it's tested I assume there are
bugs to be found.)

Anyway, in java we have a few 32-bit sizes we use that are actually
constrained to <2^31 since java has no unsigned types -- we'd need to
represent them with the lower 32 bits of a 64-bit value to allow going
larger, which we have not done. It's not clear why you'd be creating a
sample of >2 billion items, but this is more about the concept of how we
define binary compatibility.

In c++, we do have native unsigned types, meaning it'd actually be possible
to create larger sketches that would be valid in c++ but not in java. That
seems to break the idea of language portability of binary images.

Do we want to allow users the option to create non-portable sketches? Or
should we explicitly limit ourselves in such cases?

  jon

Re: Binary compatibility

Posted by Gian Merlino <gi...@apache.org>.
Another thing to think about is that if you think it'd be good to encourage
workflows where people share serialized sketches, then _not_ having an
option for unportable sketches might help with that. Otherwise things could
become fragmented as users opt in to whatever nonportable features are
provided.

On Thu, Jan 30, 2020 at 4:47 PM Gian Merlino <gi...@apache.org> wrote:

> Speaking mostly as a user of DataSketches, the behavior that I'd most
> enjoy would be (1) the default across all languages is that on-disk
> sketches are always portable, meaning the library restricts itself to a
> lowest-common-denominator feature set when it comes to serialized sketches;
> (2) but if there's a good reason for wanting a non-portable feature, users
> should be able to enable them, knowing and accepting the consequences.
> (Maybe they won't care about portability because they're only using the
> sketches in memory, or because they have no intention of sharing them with
> other programs.)
>
> My 2¢.
>
> On Wed, Jan 29, 2020 at 11:04 AM leerho <le...@gmail.com> wrote:
>
>> IMHO, I would not artificially limit the versatility of a C++ sketch
>> solely on the basis that there is no equivalent in Java due to language
>> limitations of Java.  Developers choose C++ for many reasons and one of
>> them is that C++ is so versatile with rich language capabilities.  This
>> kind of issue can be dealt with by good documentation.
>>
>> A more complicated scenario is if we develop a feature in C++ that is
>> reflected in the binary and we have not implemented that feature in Java.
>> This might just be an issue of timing. But we should make some effort to
>> strive to have comparable features in all languages that make sense and are
>> reflected in the binaries.
>>
>> There will be features that exist only in one language.  The concept of
>> off-heap, for example, only makes sense in Java, not C++.  But this feature
>> is not represented in the binary -- there is no "off-heap" flag that is
>> retained in the binary image.
>>
>>
>>
>> On Wed, Jan 29, 2020 at 1:01 AM Jon Malkin <jm...@apache.org> wrote:
>>
>>> Here's a question for the group.
>>>
>>> I'm writing unit tests for the c++ version of varopt sampling. (Finally,
>>> the main code is feature complete! But until it's tested I assume there are
>>> bugs to be found.)
>>>
>>> Anyway, in java we have a few 32-bit sizes we use that are actually
>>> constrained to <2^31 since java has no unsigned types -- we'd need to
>>> represent them with the lower 32 bits of a 64-bit value to allow going
>>> larger, which we have not done. It's not clear why you'd be creating a
>>> sample of >2 billion items, but this is more about the concept of how we
>>> define binary compatibility.
>>>
>>> In c++, we do have native unsigned types, meaning it'd actually be
>>> possible to create larger sketches that would be valid in c++ but not in
>>> java. That seems to break the idea of language portability of binary images.
>>>
>>> Do we want to allow users the option to create non-portable sketches? Or
>>> should we explicitly limit ourselves in such cases?
>>>
>>>   jon
>>>
>>

Re: Binary compatibility

Posted by Gian Merlino <gi...@apache.org>.
Speaking mostly as a user of DataSketches, the behavior that I'd most enjoy
would be (1) the default across all languages is that on-disk sketches are
always portable, meaning the library restricts itself to a
lowest-common-denominator feature set when it comes to serialized sketches;
(2) but if there's a good reason for wanting a non-portable feature, users
should be able to enable them, knowing and accepting the consequences.
(Maybe they won't care about portability because they're only using the
sketches in memory, or because they have no intention of sharing them with
other programs.)

My 2¢.

On Wed, Jan 29, 2020 at 11:04 AM leerho <le...@gmail.com> wrote:

> IMHO, I would not artificially limit the versatility of a C++ sketch
> solely on the basis that there is no equivalent in Java due to language
> limitations of Java.  Developers choose C++ for many reasons and one of
> them is that C++ is so versatile with rich language capabilities.  This
> kind of issue can be dealt with by good documentation.
>
> A more complicated scenario is if we develop a feature in C++ that is
> reflected in the binary and we have not implemented that feature in Java.
> This might just be an issue of timing. But we should make some effort to
> strive to have comparable features in all languages that make sense and are
> reflected in the binaries.
>
> There will be features that exist only in one language.  The concept of
> off-heap, for example, only makes sense in Java, not C++.  But this feature
> is not represented in the binary -- there is no "off-heap" flag that is
> retained in the binary image.
>
>
>
> On Wed, Jan 29, 2020 at 1:01 AM Jon Malkin <jm...@apache.org> wrote:
>
>> Here's a question for the group.
>>
>> I'm writing unit tests for the c++ version of varopt sampling. (Finally,
>> the main code is feature complete! But until it's tested I assume there are
>> bugs to be found.)
>>
>> Anyway, in java we have a few 32-bit sizes we use that are actually
>> constrained to <2^31 since java has no unsigned types -- we'd need to
>> represent them with the lower 32 bits of a 64-bit value to allow going
>> larger, which we have not done. It's not clear why you'd be creating a
>> sample of >2 billion items, but this is more about the concept of how we
>> define binary compatibility.
>>
>> In c++, we do have native unsigned types, meaning it'd actually be
>> possible to create larger sketches that would be valid in c++ but not in
>> java. That seems to break the idea of language portability of binary images.
>>
>> Do we want to allow users the option to create non-portable sketches? Or
>> should we explicitly limit ourselves in such cases?
>>
>>   jon
>>
>

Re: Binary compatibility

Posted by leerho <le...@gmail.com>.
IMHO, I would not artificially limit the versatility of a C++ sketch solely
on the basis that there is no equivalent in Java due to language
limitations of Java.  Developers choose C++ for many reasons and one of
them is that C++ is so versatile with rich language capabilities.  This
kind of issue can be dealt with by good documentation.

A more complicated scenario is if we develop a feature in C++ that is
reflected in the binary and we have not implemented that feature in Java.
This might just be an issue of timing. But we should make some effort to
strive to have comparable features in all languages that make sense and are
reflected in the binaries.

There will be features that exist only in one language.  The concept of
off-heap, for example, only makes sense in Java, not C++.  But this feature
is not represented in the binary -- there is no "off-heap" flag that is
retained in the binary image.



On Wed, Jan 29, 2020 at 1:01 AM Jon Malkin <jm...@apache.org> wrote:

> Here's a question for the group.
>
> I'm writing unit tests for the c++ version of varopt sampling. (Finally,
> the main code is feature complete! But until it's tested I assume there are
> bugs to be found.)
>
> Anyway, in java we have a few 32-bit sizes we use that are actually
> constrained to <2^31 since java has no unsigned types -- we'd need to
> represent them with the lower 32 bits of a 64-bit value to allow going
> larger, which we have not done. It's not clear why you'd be creating a
> sample of >2 billion items, but this is more about the concept of how we
> define binary compatibility.
>
> In c++, we do have native unsigned types, meaning it'd actually be
> possible to create larger sketches that would be valid in c++ but not in
> java. That seems to break the idea of language portability of binary images.
>
> Do we want to allow users the option to create non-portable sketches? Or
> should we explicitly limit ourselves in such cases?
>
>   jon
>