You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2022/08/26 21:21:16 UTC

[GitHub] [parquet-format] anjakefala opened a new pull request, #184: PARQUET-758: Add Float16/Half-float logical type

anjakefala opened a new pull request, #184:
URL: https://github.com/apache/parquet-format/pull/184

   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Parquet Jira 1](https://issues.apache.org/jira/browse/PARQUET-758) and [2](https://issues.apache.org/jira/browse/PARQUET-1647) issues and references them in the PR title. 
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1778048700

   @gszadovszky What is the merging process once it has approval and passed voting? =)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1781779546

   For the record, I've announced the vote's passing in the original ML thread itself (apologies if the `[RESULT]` thread wasn't sufficient).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1747339124

   @anjakefala Agreed that everything seems to be in place. I'll be starting the vote on the ML later today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1412476439

   > @shangxinli are there guidelines for what needs to happen to accept this addition?
   
   I suppose it needs a discussion and then a formal vote on the ML?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r957060962


##########
LogicalTypes.md:
##########
@@ -245,6 +245,18 @@ comparison.
 To support compatibility with older readers, implementations of parquet-format should
 write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.
 
+### Half-precision floating-point
+
+Also known as `float16`. Used in contexts where precision is traded off for performance and efficient storage.

Review Comment:
   ```suggestion
   ### FLOAT16
   
   The `FLOAT16` annotation represents IEEE 16-bit half-precision floating-point numbers .
   Used in contexts where precision is traded off for smaller footprint and potentially better performance.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1341697557

   @anjakefala  IIUC, I think the prior objection was around not properly floating point sorting for statistics correctly.  I think the main thing is to update the specification to say this requires the same sorting logic as float32 and float64.  I need to rereview the current state of things, but then I think we can probably try to vote on the mailing list to see if this type is acceptable.  I'm not sure on the exact process here (I don't know if implementations are required before a vote).  @gszadovszky thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1765067676

   @pitrou @gszadovszky @julienledem 
   
   Given that the [vote for this has just passed](https://lists.apache.org/thread/odm5pmxssyd9kw1wvgdkg8hd044czqnk), I believe we should be good to merge this now? (pending a final review pass, of course)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1236263820

   I think Parquet C++ probably has the same issue.  Let me reread [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222). to see what the current state is and if we can push it forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231185256

   We should probably specify that using the [Byte Split Encodings](https://github.com/apache/parquet-format/blob/master/Encodings.md#byte-stream-split-byte_stream_split--9) can be used for this type as well?
   
   Also, in general, if possible try to avoid force pushing, as it makes it harder to compare iterative changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] gszadovszky commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1592472334

   I think that compatibility in Parquet file format is such a strong requirement that extending primitive types is simply not an option. (I agree though, if I would introduce a new file format I would specify only 3 primitive types: FIXED_LEN, VAR_LEN, BIT. But we already have what we have.)
   Meanwhile, I don't see why the Float16 physical type would be a requirement to use BYTE_STREAM_SPLIT. I don't think it is widely used so we can update the related spec to allow this encoding to be used for any primitive type (except BOOLEAN). Then, it is up to the implementations to use it for `FIXED_LEN_BYTE_ARRAY[2] (FLOAT16)`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1749367647

   @pitrou @emkornfield @gszadovszky @JFinis @julienledem @shangxinli 
   
   The vote has been started by @benibus here: https://lists.apache.org/thread/gyvqcx9ssxkjlrwogqwy7n4z6ofdm871 Please chime in! I would also appreciate anyone forwarding the vote to the private listserv. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1239282706

   I agree with @emkornfield that ordering issues seem largely orthogonal, as they also affect FLOAT32 and FLOAT64 types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1232420353

   > t is not that trivial. For the half-precision floating point numbers we do not have native support for either cpp or java so we can define the total ordering as we want. But we shall do the same for the existing floating point numbers that most languages have native support. Even though they are following the same standard the total ordering either does not exist or have different implementations. See [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222) for details.
   
   I think these are orthogonal.  I might be missing something but it seems like it would not be to hard to case float16 to float in java/cpp and do the comparison in that space and cast it back down.  This might not be the most efficient implementation but would be straightforward? I am probably missing something.  It would be nice to resolve [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222) so the same semantics would apply to all floating point numbers.  
   
   > The tricky thing will be the implementations. Even though parquet-mr does not really care about converting the values according to their logical types we still need to care about the logical types at the ordering (min/max values in the statistics).
   
   It seems this would require parquet implementations to null out statistics for logical types that they don't support, does parquet-mr do that today?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] shangxinli commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "shangxinli (via GitHub)" <gi...@apache.org>.
shangxinli commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1412526562

   As @julienledem mentioned in the email discussion, let's have the corresponding PRs for support in the Java and C++ implementation ready before we merge this PR. We would like to have implementation support when the new type is released. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r1361571897


##########
LogicalTypes.md:
##########
@@ -245,6 +245,16 @@ comparison.
 To support compatibility with older readers, implementations of parquet-format should
 write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.
 
+### FLOAT16
+
+The `FLOAT16` annotation represents half-precision floating-point numbers in the 2-byte IEEE little-endian format.
+
+Used in contexts where precision is traded off for smaller footprint and potentially better performance.
+
+The primitive type is a 2-byte fixed length binary.
+
+The sort order for `FLOAT16` is signed (with special handling of NANs and signed zeros); it uses the same [logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT32` and `FLOAT64`.

Review Comment:
   The related primitive types are called `FLOAT` and `DOUBLE`. `FLOAT32` and `FLOAT64` are unknown types for Parquet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1782400892

   Sorry, @benibus. My bad. Thank you for managing the vote!
   I'm pushing this PR...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky merged PR #184:
URL: https://github.com/apache/parquet-format/pull/184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r957060962


##########
LogicalTypes.md:
##########
@@ -245,6 +245,18 @@ comparison.
 To support compatibility with older readers, implementations of parquet-format should
 write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.
 
+### Half-precision floating-point
+
+Also known as `float16`. Used in contexts where precision is traded off for performance and efficient storage.

Review Comment:
   ```suggestion
   ### FLOAT16
   
   The `FLOAT16` annotation represents half-precision floating-point numbers
   in the 2-byte IEEE little-endian format.
   Used in contexts where precision is traded off for smaller footprint and potentially better performance.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r1042516177


##########
src/main/thrift/parquet.thrift:
##########
@@ -232,6 +232,7 @@ struct MapType {}     // see LogicalTypes.md
 struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BINARY, must be encoded with UTF-8
 struct DateType {}    // allowed for INT32
+struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes

Review Comment:
   yes. BYTE_STREAM_SPLIT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r964739579


##########
src/main/thrift/parquet.thrift:
##########
@@ -232,6 +232,7 @@ struct MapType {}     // see LogicalTypes.md
 struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BINARY, must be encoded with UTF-8
 struct DateType {}    // allowed for INT32
+struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes

Review Comment:
   @emkornfield What do you mean here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] julienledem commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "julienledem (via GitHub)" <gi...@apache.org>.
julienledem commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1428363226

   @emkornfield apologies for this, I realize I replied to a thread on the private list. @pitrou emailed private@ to get more eyes on this. Which worked. But yes, this discussion didn't need to be private.
   I replied: "I would recommend having the corresponding PRs for support in the Java and C++ implementation ready as well for this to be merged so that we don't release a new type that does not have support in the implementation."
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1747321345

   It seems that both the [Java implementation](https://github.com/apache/parquet-mr/pull/1142) and the [C++ implementation](https://github.com/apache/arrow/pull/36073) are in a state of readiness.
   
   Has the vote started? Can anyone with visibility update me on the status?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1767050894

   > I've suggested the name `FLOAT_16` in the vote like we already have logical types `INT_8` etc
   
   I think this was only the convention for legacy `ConvertedType` enums. We could theoretically deviate from that since there are no sized integral logical types (they're all rolled into `INTEGER`/`IntType`).
   
   As for `BYTE_STREAM_SPLIT`, my feeling is that we'll _probably_ want it (for parity with `FLOAT`/`DOUBLE`, at least), but it could be added as a follow-up - along with an implementation + benchmarks, if necessary. There's also some ambiguity about whether to support BSS for `FixedLenByteArray` generally, which may warrant a separate discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] gszadovszky commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231323733

   > > It would not be too easy to implement the half-precision floating point comparison logic since java does not have such a primitive type.
   > 
   > While not effortless, it should be relatively easy to adapt one of the routines that's available from other open source projects, such as Numpy: https://github.com/numpy/numpy/blob/8a0859835d3e6002858b9ffd9a232b059cf9ea6c/numpy/core/src/npymath/halffloat.c#L169-L190 (`npy_half` is just an unsigned 16-bit integer in this context)
   
   It is not that trivial. For the half-precision floating point numbers we do not have native support for either cpp or java so we can define the total ordering as we want. But we shall do the same for the existing floating point numbers that most languages have native support. Even though they are following the same standard the total ordering either does not exist or have different implementations. See [PARQUET-1222(https://issues.apache.org/jira/browse/PARQUET-1222) for details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] gszadovszky commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1232491783

   I've came up with this ordering thing because we specify it for every logical types. (Unfortunately we don't do this for primitive types.) Therefore, I would expect to have the order specified for this new logical type as well which is not trivial and requires to solve [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222). At least we should add a note about this issue.
   
   > It seems this would require parquet implementations to null out statistics for logical types that they don't support, does parquet-mr do that today?
   
   I do not have the proper environment to test it but based on the code we do not handle unknown logical types well in parquet-mr. I think it handles unknown logical types as if they were not there at all which is fine from the data point of view but we would blindly use the statistics which may cause data loss. Created [PARQUET-2182](https://issues.apache.org/jira/browse/PARQUET-2182) to track this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] anjakefala commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1341655934

   Thanks so much for the update @emkornfield! 
   
   What is the next step I can take?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1765583315

   @wgtmac 
   
   > Should we merge the PR in parquet-format first? My point is that it would be weird if this change commits with an unreleased and even uncommitted change of parquet.thrift.
   
   This is the parquet-format PR!
   
   There are too many PRs. xD


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1783306935

   I really appreciate everyone who took time out of their lives to give this PR attention! :)) Thanks for the final merge @gszadovszky! And yes, my apache arrow JIRA handle is the same as github `@anjakefala`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] anjakefala commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1376199292

   Hey @emkornfield! Is it reasonable for me to send a proposal to the mailing list for a vote? It seems  @gszadovszky is not available for insight; is there anyone else that can provide it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "emkornfield (via GitHub)" <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1412470881

   @shangxinli are there guidelines for what needs to happen to accept this addition?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r964742903


##########
src/main/thrift/parquet.thrift:
##########
@@ -232,6 +232,7 @@ struct MapType {}     // see LogicalTypes.md
 struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BINARY, must be encoded with UTF-8
 struct DateType {}    // allowed for INT32
+struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes

Review Comment:
   Ah, perhaps you mean the BYTE_STREAM_SPLIT encoding?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1341351021

   Sorry for the delay but PARQUET-1222 has now been merged, so I think this is unblocked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "emkornfield (via GitHub)" <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1427471772

   > As @julienledem mentioned in the email discussion, let's have the corresponding PRs for support in the Java and C++ implementation ready before we merge this PR. We would like to have implementation support when the new type is released.
   
   It might have missed it but I didn't see Julien's reply on the dev mailing list.  This seems reasonable though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231187345

   It isn't clear to me if this should be a logical type or a physical type. We would need understand if there is different handling for forward compatibility purposes (what do we want the desired behavior to be be).  I think C++ might be lenient here, but don't know about parquet-mr @gszadovszky thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] anjakefala commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
anjakefala commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r957822642


##########
src/main/thrift/parquet.thrift:
##########
@@ -342,6 +343,7 @@ union LogicalType {
   12: JsonType JSON           // use ConvertedType JSON
   13: BsonType BSON           // use ConvertedType BSON
   14: UUIDType UUID           // no compatible ConvertedType
+  15: Float16Type FLOAT16     // no compatible ConvertedType

Review Comment:
   What is the ConvertedType?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] JFinis commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "JFinis (via GitHub)" <gi...@apache.org>.
JFinis commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1587367949

   > > It isn't clear to me if this should be a logical type or a physical type. We would need understand if there is different handling for forward compatibility purposes (what do we want the desired behavior to be be). I think C++ might be lenient here, but don't know about parquet-mr @gszadovszky thoughts?
   > 
   > I think the basic idea behind having physical and logical types is to support forward compatibility since we can always represent (somehow) a long-existing physical type while logical types are getting extended. Parquet-mr should work fine with "unknown" logical types by reading it back as an un-annotated physical vale (a `Binary` with two bytes in this case). So, if the community supports having a half-precision floating point type I would vote on specifying it as a logical type.
   > 
   > The tricky thing will be the implementations. Even though parquet-mr does not really care about converting the values according to their logical types we still need to care about the logical types at the ordering (min/max values in the statistics). It would not be too easy to implement the half-precision floating point comparison logic since java does not have such a primitive type. (BTW the sorting order of floating point numbers are still an open issue: [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222))
   
   FWIW, I rather think it should be a physical type for the following reasons:
   
   * encodings are currently only defined on the physical type, not the logical one. So allowing BYTE_STREAM_SPLIT for this type would actually break this if it is a logical type.
   * Having this be a logical type while float and double are physical types seems inconsistent.
   * There might eventually be hardware support or native language support for this for this type. In this case, having it as physical type would allow easier to leverage this hardware / language support, as most libraries instantiate encoders/decoders based on the physical type. Again, having now one exception where you would need a decoder based on a *logical* type would break this pattern and require additional effort. If Java and C++ had a float16 type, I guess more people would agree that it should be a physical type. So is the intuition of this being a logical type just based on the yet missing language support for this?
   * IMHO, the basic idea behind physical and logical types is not to support forward compatibility; that is just a byproduct. Otherwise, there should just be one or two physical types in the first place (FIXED_LEN_BYTE_ARRAY and BYTE_ARRAY). The basic idea is rather to make a distinction between physical representation and what the values logically mean. In my mental model it is rather a layered approach: There are layers that only care about the physical types (e.g., the encoders/decoders) and then further layers that also care about the logical type (e.g. the statistics maintenance code). And here again, this would break this layering.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1765496311

   > @pitrou @gszadovszky @julienledem
   > 
   > Given that the [vote for this has just passed](https://lists.apache.org/thread/odm5pmxssyd9kw1wvgdkg8hd044czqnk), I believe we should be good to merge this now? (pending a final review pass, of course)
   
   Should we merge the PR in parquet-format first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1765601784

   > @wgtmac
   > 
   > > Should we merge the PR in parquet-format first? My point is that it would be weird if this change commits with an unreleased and even uncommitted change of parquet.thrift.
   > 
   > This is the parquet-format PR!
   > 
   > There are too many PRs. xD
   
   My bad! I got lost in these PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1778641456

   @benibus, could you officially close the vote on the mailing list so it is clear that it has passed?
   @anjakefala, since we have 3 approvals already on this PR, any committer can push it. I would wait for the official closing of the vote, thought.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r958042831


##########
src/main/thrift/parquet.thrift:
##########
@@ -232,6 +232,7 @@ struct MapType {}     // see LogicalTypes.md
 struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BINARY, must be encoded with UTF-8
 struct DateType {}    // allowed for INT32
+struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes

Review Comment:
   why not allow bit splitting?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r957063147


##########
LogicalTypes.md:
##########
@@ -245,6 +245,18 @@ comparison.
 To support compatibility with older readers, implementations of parquet-format should
 write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.
 
+### Half-precision floating-point
+
+Also known as `float16`. Used in contexts where precision is traded off for performance and efficient storage.
+
+It is stored as a two-byte fixed-length binary, following the 16-bit IEEE standards:
+
+* 1 sign bit
+* 5 bits exponent
+* 10 bits mantissa/significand
+
+sign (mantissa) * 10^exponent

Review Comment:
   ```suggestion
   The primitive type is a 2-byte fixed length binary.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r957064862


##########
src/main/thrift/parquet.thrift:
##########
@@ -416,6 +417,7 @@ enum Encoding {
    * BOOLEAN - 1 bit per value. 0 is false; 1 is true.
    * INT32 - 4 bytes per value.  Stored as little-endian.
    * INT64 - 8 bytes per value.  Stored as little-endian.
+   * FLOAT16 - 2 bytes per value. IEEE. Stored as little-endian.

Review Comment:
   You shouldn't mention logical types in this comment, only physical types. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] gszadovszky commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231284535

   > It isn't clear to me if this should be a logical type or a physical type. We would need understand if there is different handling for forward compatibility purposes (what do we want the desired behavior to be be). I think C++ might be lenient here, but don't know about parquet-mr @gszadovszky thoughts?
   
   I think the basic idea behind having physical and logical types is to support forward compatibility since we can always represent (somehow) a long-existing physical type while logical types are getting extended. Parquet-mr should work fine with "unknown" logical types by reading it back as an un-annotated physical vale (a `Binary` with two bytes in this case).
   So, if the community supports having a half-precision floating point type I would vote on specifying it as a logical type.
   
   The tricky thing will be the implementations. Even though parquet-mr does not really care about converting the values according to their logical types we still need to care about the logical types at the ordering (min/max values in the statistics). It would not be too easy to implement the half-precision floating point comparison logic since java does not have such a primitive type. (BTW the sorting order of floating point numbers are still an open issue: [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222))
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] anjakefala commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1352295517

   Thank you @emkornfield! I added the sort order to the spec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r1092267094


##########
src/main/thrift/parquet.thrift:
##########
@@ -232,6 +232,7 @@ struct MapType {}     // see LogicalTypes.md
 struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BINARY, must be encoded with UTF-8
 struct DateType {}    // allowed for INT32
+struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes

Review Comment:
   Well, I guess it wouldn't cost much to allow it (implementations would not support it at the start anyway).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1591494344

   > FWIW, I rather think it should be a physical type for the following reasons:
   > 
   > * encodings are currently only defined on the physical type, not the logical one. So allowing BYTE_STREAM_SPLIT for this type would actually break this if it is a logical type.
   
   This seems reasonable, but BYTE_STREAM_SPLIT is the only relevant encoding here (and it's probably not widely used yet).
   
   > * Having this be a logical type while float and double are physical types seems inconsistent.
   
   While I agree it seems inconsistent, this is an idealistic argument. If Float16 had been included from scratch, it would be logical ( :-) ) to make it a physical type because there would not be any compatibility problem.
   
   > * There might eventually be hardware support or native language support for this for this type. In this case, having it as physical type would allow easier to leverage this hardware / language support, as most libraries instantiate encoders/decoders based on the physical type.
   
   I'm not sure by how much making Float16 a physical type would make encoding/decoding easier. The main change would be that bytewidth is known at compile time, instead of at runtime. Other than that, having HW support for Float16 would not change the ease of copying data two bytes at a time...
   
   > * IMHO, the basic idea behind physical and logical types is not to support forward compatibility; that is just a byproduct. Otherwise, there should just be one or two physical types in the first place (FIXED_LEN_BYTE_ARRAY and BYTE_ARRAY).
   
   That's partly true, but once the Parquet format is widely used, forward compatibility becomes a significant concern.
   
   If Float16 is a new physical type, existing systems will probably not be able to read data with such a column _at all_. If Float16 is a new logical type, existing systems should be able to read the data; they just won't be able to draw any insight from the Float16 columns (but will be able to process the other columns).
   
   To sum it up:
   * making Float16 a logical type would make adoption of the new type faster as compatibility with existing systems would be preserved
   * making Float16 a physical type would be more consistent overall, and would allow applying the BYTE_STREAM_SPLIT encoding
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r957064370


##########
src/main/thrift/parquet.thrift:
##########
@@ -34,6 +34,7 @@ enum Type {
   INT32 = 1;
   INT64 = 2;
   INT96 = 3;  // deprecated, only used by legacy implementations.
+  FLOAT16 = 2;

Review Comment:
   `Type` is for physical types, so you shouldn't add anything here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231300374

   > It would not be too easy to implement the half-precision floating point comparison logic since java does not have such a primitive type.
   
   While not effortless, it should be relatively easy to adapt one of the routines that's available from other open source projects, such as Numpy:
   https://github.com/numpy/numpy/blob/main/numpy/core/src/npymath/halffloat.c#L169-L190
   (`npy_half` is just an unsigned 16-bit integer in this context)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r957065065


##########
src/main/thrift/parquet.thrift:
##########
@@ -889,6 +891,7 @@ union ColumnOrder {
    *   INT32 - signed comparison
    *   INT64 - signed comparison
    *   INT96 (only used for legacy timestamps) - undefined
+   *   FLOAT16 - signed comparison of the represented value (*)

Review Comment:
   Same here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1229983463

   @anjakefala You need to add to the `LogicalType` union, not to the `Type` enum (which is for physical types).
   
   Also cc @emkornfield 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1263116115

   Sorry for the delay, it sounds like PARQUET-1222 is blocker, let me make a proposal there and see if we can at least come to consensus on approach and maybe this feature can be the first test-case for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] anjakefala commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by GitBox <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1262879211

   @pitrou @emkornfield @gszadovszky 
   
   Is there anything I can do to move this addition forward? Can I help with any code?
   
   My understanding from reading the comments is that @gszadovszky brought up an ordering concern (valid, but not a blocker?), and that a decision needs to be made on whether float16 would be implemented as a logical or physical type? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1427745167

   > It might have missed it but I didn't see Julien's reply on the dev mailing list. This seems reasonable though.
   
   For full disclosure, it was a discussion involving the Parquet PMC and myself after I messaged the PMC to inquire about the way forward on this proposal.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1765791144

   > I agree with @emkornfield that we should allow the encoding `BYTE_STREAM_SPLIT` to be used for this new logical type. It is fine to handle it separately, though.
   
   I would contend that perhaps `BYTE_STREAM_SPLIT` wouldn't yield very interesting results on FLOAT16. It would be interesting to get numbers.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] anjakefala commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1593249596

   PR for C++ float16 implementation in Parquet: https://github.com/apache/arrow/pull/36073


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] PARQUET-758: Add Float16/Half-float logical type [parquet-format]

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1782407229

   @anjakefala, do you have a jira user so I can assign it to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org