You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "danepitkin (via GitHub)" <gi...@apache.org> on 2023/05/24 19:48:58 UTC

[GitHub] [arrow] danepitkin opened a new issue, #35746: [Parquet] Bump the format version from 2.4 -> 2.6

danepitkin opened a new issue, #35746:
URL: https://github.com/apache/arrow/issues/35746

   ### Describe the enhancement requested
   
   Parquet format version 2.6 introduces the NanoSecond time unit for Time and Timestamp logical types.
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35746: [Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35746:
URL: https://github.com/apache/arrow/issues/35746#issuecomment-1562318838

   Yes, we indeed already support that version, but we _default_ to 2.4 at the moment. I edited the title that the issue is about changing the default version.
   
   The default in C++:
   
   https://github.com/apache/arrow/blob/6d2df074e624a6a4462a1539be8dadf19cf39df4/cpp/src/parquet/properties.h#L207-L221
   
   And similarly in the Python bindings the default is also "2.4".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35746: [Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35746:
URL: https://github.com/apache/arrow/issues/35746#issuecomment-1593384883

   For reference (and to see where things need to be changed), the commit of the previous time bumping the version: https://github.com/apache/arrow/commit/797c88a9a0ec73fa4d24554c3af83b841c205681


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #35746: [Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #35746:
URL: https://github.com/apache/arrow/issues/35746#issuecomment-1562331856

   It seems that parquet-cpp has implemented features (e.g. modular encryption and BYTE_STREAM_SPLIT encoding) beyond version 2.6. We probably need to update supported versions and manage features based on the version.
   
   https://github.com/apache/parquet-format/blob/master/CHANGES.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35746: [Parquet] Bump the format version from 2.4 -> 2.6

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35746:
URL: https://github.com/apache/arrow/issues/35746#issuecomment-1562207107

   I found that our implemention already supports `Timestamp`:
   
   ```c++
   /// \brief Allowed for physical type INT64.
   class PARQUET_EXPORT TimestampLogicalType : public LogicalType {
    public:
     static std::shared_ptr<const LogicalType> Make(bool is_adjusted_to_utc,
                                                    LogicalType::TimeUnit::unit time_unit,
                                                    bool is_from_converted_type = false,
                                                    bool force_set_converted_type = false);
     bool is_adjusted_to_utc() const;
     LogicalType::TimeUnit::unit time_unit() const;
   
     /// \brief If true, will not set LogicalType in Thrift metadata
     bool is_from_converted_type() const;
   
     /// \brief If true, will set ConvertedType for micros and millis
     /// resolution in legacy ConvertedType Thrift metadata
     bool force_set_converted_type() const;
   
    private:
     TimestampLogicalType() = default;
   };
   ```
   
   And timeunit has `NANO`
   
   ```
     struct TimeUnit {
       enum unit { UNKNOWN = 0, MILLIS = 1, MICROS, NANOS };
     };
   ```
   
   Do we already supports them?
   
   /cc @wgtmac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35746: [Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35746:
URL: https://github.com/apache/arrow/issues/35746#issuecomment-1562373760

   > It seems that parquet-cpp has implemented features (e.g. modular encryption and BYTE_STREAM_SPLIT encoding) beyond version 2.6. We probably need to update supported versions and manage features based on the version.
   > 
   > https://github.com/apache/parquet-format/blob/master/CHANGES.md
   
   We can open issue about this, I can take time to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche closed issue #35746: [Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche closed issue #35746: [Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6
URL: https://github.com/apache/arrow/issues/35746


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35746: [Parquet] Bump the format version from 2.4 -> 2.6

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35746:
URL: https://github.com/apache/arrow/issues/35746#issuecomment-1562208290

   Have you tried:
   
   ```
   ::parquet::WriterProperties::Builder()
                                               .version(ParquetVersion::PARQUET_2_6)
                                               ->build();
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35746: [Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35746:
URL: https://github.com/apache/arrow/issues/35746#issuecomment-1593245667

   Ok, so parquet-mr implemented nanosecond precision timestamps in 2018: https://github.com/apache/parquet-mr/pull/519
   while we did so in 2018 (basic support) and 2019 (Arrow roundtrip):
   https://github.com/apache/arrow/pull/4185
   https://github.com/apache/arrow/pull/4421
   
   I think this makes it ok to bump the default to 2.6.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org