You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2015/07/04 15:30:11 UTC

[lucy-dev] Clownfish Num API

Lucifers,

When reviewing the Clownfish Num classes, I made the following observations:

- FloatNum_Compare_To isn't correct for large IntNums.
- IntNum_Compare_To and Int64_Equals aren't correct for large
   FloatNums.
- Num_Equals doesn't handle the corner case of IntNums near
   INT64_MAX correctly.
- Converting FloatNums to IntNums doesn't check for overflow.

This is fixed in my `num_compare` branch:

     https://github.com/nwellnhof/lucy-clownfish/commits/num_compare

Then there's an asymmetry when comparing BoolNums.

    BoolNum_Equals(CFISH_FALSE, Int32_new(0)) == false
    Int32_Equals(Int32_new(0), CFISH_FALSE) == true

My original fix was to remove the specialization of BoolNum_Equals. But 
thinking more about it, it might be better to make BoolNum not inherit from 
Num at all. This would mean that a Boolean never equals a Num and that 
Compare_To throws when comparing Booleans and Nums.

Nick

Re: [lucy-dev] Clownfish Num API

Posted by Nick Wellnhofer <we...@aevum.de>.

On 11/07/2015 04:01, Marvin Humphrey wrote:
> Wow.  I thought we had disagreed about this in the past, so I was prepping
> for some constructive debate. ;)  But now, not only have we reached agreement
> on all important issues, you've coded the whole thing up!

Yeah, at some point I was concerned about removing Int32. Back then, I didn't 
fully understand the purpose of the Num classes. I thought removing Int32 
would force users to always use 64-bit integers, but this isn't the case.

> I also checked the conversion code in Lucy's SortFieldWriter, here:
>
> https://github.com/nwellnhof/lucy/commit/8144f5c7f3806873b58a9f1e751243f2c294eb13#diff-6c25c769e6f97d81aa1663f8781965c2L276
>
> The lossy cast is a decent approach.  It would also have been reasonable to
> throw an exception on overflow, but the chances of truncation-on-overflow
> causing a serious problem seem remote.

I think SortFieldWriter will never see an out-of-bounds integer in an INT32 
field. But maybe we should add a range check to Inverter.

Nick

Re: [lucy-dev] Clownfish Num API

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Jul 9, 2015 at 7:46 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 09/07/2015 04:53, Marvin Humphrey wrote:
>>
>> In retrospect, I regret the Num inheritance hierarchy.  I wish that
>> instead, we had only Boolean, Integer, and Float, matching the types
>> available in JSON.  That wouldn't solve all our problems, it would just
>> give us a different set of problems -- but I feel like that set of problems
>> is more appropriate for Clownfish to have.
>
> I agree. Here is a branch that implements most of your ideas.
>
>     https://github.com/nwellnhof/lucy-clownfish/commits/num_hierarchy
>     https://github.com/nwellnhof/lucy/commits/num_hierarchy

Wow.  I thought we had disagreed about this in the past, so I was prepping
for some constructive debate. ;)  But now, not only have we reached agreement
on all important issues, you've coded the whole thing up!

I've reviewed the branches and the changesets all look sound.  +1 to merge!

The one change I'm a little leery of is this:

     int64_t
     Float64_To_I64_IMP(Float64 *self) {
    +    if (self->value < -POW_2_63 || self->value >= POW_2_63) {
    +        THROW(ERR, "Float64 out of range: %f64", self->value);
    +    }
     return (int64_t)self->value;
     }

I understand the rationale, though, and it's better to add the exception while
we're focused on this code than it is to leave it be.  Perhaps we can convey
the error via a MAYBE return type in the future...

I also checked the conversion code in Lucy's SortFieldWriter, here:

https://github.com/nwellnhof/lucy/commit/8144f5c7f3806873b58a9f1e751243f2c294eb13#diff-6c25c769e6f97d81aa1663f8781965c2L276

The lossy cast is a decent approach.  It would also have been reasonable to
throw an exception on overflow, but the chances of truncation-on-overflow
causing a serious problem seem remote.

> Execpt for:
>
>> *   Integer, Float, and Boolean would all live in their own files rather
>>     than in Num.cfh/Num.c.

OK, no big deal.

> Boolean is immutable already. Making Integer and Float immutable requires
> some adjustments to Lucy's Inverter similar to the changes for immutable
> Strings.

Looks good!

Marvin Humphrey

Re: [lucy-dev] Clownfish Num API

Posted by Nick Wellnhofer <we...@aevum.de>.

On 09/07/2015 04:53, Marvin Humphrey wrote:
> In retrospect, I regret the Num inheritance hierarchy.  I wish that instead,
> we had only Boolean, Integer, and Float, matching the types available in JSON.
> That wouldn't solve all our problems, it would just give us a different set of
> problems -- but I feel like that set of problems is more appropriate for
> Clownfish to have.

I agree. Here is a branch that implements most of your ideas.

     https://github.com/nwellnhof/lucy-clownfish/commits/num_hierarchy
     https://github.com/nwellnhof/lucy/commits/num_hierarchy

Execpt for:

> *   Integer, Float, and Boolean would all live in their own files rather than
>      in Num.cfh/Num.c.

Implementing Integer and Float in a single file allows to share some static 
functions.

> *   Boolean, Integer, and Float would all be immutable.

Boolean is immutable already. Making Integer and Float immutable requires some 
adjustments to Lucy's Inverter similar to the changes for immutable Strings.

> This wouldn't impact much because the box types for numerics are infrequently
> encountered in Clownfish-flavored C.

Yes, boxed objects are expensive anyway, so the additional overhead shouldn't 
matter.

Nick

Re: [lucy-dev] Clownfish Num API

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sat, Jul 4, 2015 at 6:30 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> Lucifers,
>
> When reviewing the Clownfish Num classes, I made the following observations:
>
> - FloatNum_Compare_To isn't correct for large IntNums.
> - IntNum_Compare_To and Int64_Equals aren't correct for large
>   FloatNums.
> - Num_Equals doesn't handle the corner case of IntNums near
>   INT64_MAX correctly.
> - Converting FloatNums to IntNums doesn't check for overflow.
>
> This is fixed in my `num_compare` branch:
>
>     https://github.com/nwellnhof/lucy-clownfish/commits/num_compare

This is definitely a nice improvement over what we have now,
so +1 to merge!

> Then there's an asymmetry when comparing BoolNums.
>
>    BoolNum_Equals(CFISH_FALSE, Int32_new(0)) == false
>    Int32_Equals(Int32_new(0), CFISH_FALSE) == true
>
> My original fix was to remove the specialization of BoolNum_Equals. But
> thinking more about it, it might be better to make BoolNum not inherit from
> Num at all.

+1

I like not inheriting from Num.  And I agree that to eliminate the assymetry
of Equals, it makes sense to require explicit conversion to Boolean first.

> This would mean that a Boolean never equals a Num

As a consequence, that means Equals() will fail when we take a JSON-izable
data structure which includes a Boolean and round-trip it through a host which
does not offer a dedicated boolean type -- such as Perl.

    Boolean -> IOK SV -> IntNum

I think that's for the best.  We definitely are not goving to get
Equals-after-round-tripping-deserialized-JSON working across all hosts, so the
sooner we forswear that idiom and find alternatives, the better.

> and that
> Compare_To throws when comparing Booleans and Nums.

That seems acceptable -- it doesn't prevent us from evaluating the truthiness
of non-boolean numeric types (via To_Bool), it just stops us from using
Compare_To.

In retrospect, I regret the Num inheritance hierarchy.  I wish that instead,
we had only Boolean, Integer, and Float, matching the types available in JSON.
That wouldn't solve all our problems, it would just give us a different set of
problems -- but I feel like that set of problems is more appropriate for
Clownfish to have.

*   Num, IntNum, and FloatNum would all be eliminated, and Boolean, Integer,
    and Float would all descend from Obj directly.
*   Integer, Float, and Boolean would all live in their own files rather than
    in Num.cfh/Num.c.
*   Integer would have an internal `int64_t` value and FloatNum would have an
    internal `double` value.
*   Boolean, Integer, and Float would all be immutable.
*   The Equals() and Compare_To() methods of Integer and Float would allow
    cross-comparisons (using your new algorithms).

This wouldn't impact much because the box types for numerics are infrequently
encountered in Clownfish-flavored C.

Marvin Humphrey