You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Martin Loncaric (Jira)" <ji...@apache.org> on 2022/05/26 17:21:00 UTC
[jira] [Updated] (ORC-1191) Benchmark Taxi CSV Dataset No Longer Exists
[ https://issues.apache.org/jira/browse/ORC-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Loncaric updated ORC-1191:
---------------------------------
Description:
New York TLC has replaced their CSV dataset with a Parquet version, so we should switch to that.
Since 5/12, NYC Taxi dataset used in benchmarks no longer exists as CSV's; has been replaced with Parquet
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
bq. On 05/13/2022, we are making the following changes to trip record files: All files will be stored in the Parquet format. Please see the ‘Working With Parquet Format’ under the Data Dictionaries and MetaData section.
was:New York TLC has replaced their CSV dataset with a Parquet version, so we should switch to that.
> Benchmark Taxi CSV Dataset No Longer Exists
> -------------------------------------------
>
> Key: ORC-1191
> URL: https://issues.apache.org/jira/browse/ORC-1191
> Project: ORC
> Issue Type: Bug
> Reporter: Martin Loncaric
> Priority: Minor
>
> New York TLC has replaced their CSV dataset with a Parquet version, so we should switch to that.
> Since 5/12, NYC Taxi dataset used in benchmarks no longer exists as CSV's; has been replaced with Parquet
> https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
> bq. On 05/13/2022, we are making the following changes to trip record files: All files will be stored in the Parquet format. Please see the ‘Working With Parquet Format’ under the Data Dictionaries and MetaData section.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)