You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/13 15:44:47 UTC

[GitHub] [arrow-datafusion] Jimexist edited a comment on issue #1441: Incorrect results in datafusion

Jimexist edited a comment on issue #1441:
URL: https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-992607329


   thanks for the update.
   
   to people reading this, i'm still trying to minimize the reproduction steps, so i guess below is a simpler statement:
   
   ```
   CREATE EXTERNAL TABLE stop_parquet STORED AS PARQUET LOCATION './parquets/stops';
   CREATE EXTERNAL TABLE stop_csv (time TEXT, trip_tid TEXT, trip_line TEXT, stop_name TEXT) STORED AS CSV LOCATION './csvs/stop.csv';
   ```
   
   ```
   ❯ select distinct stop_name from stop_csv;
   +------------------------------+
   | stop_name                    |
   +------------------------------+
   | Szczęśliwice                 |
   | Wawelska                     |
   | Bolesławicka                 |
   ...
   | Ceramiczna                   |
   | Czołgistów                   |
   +------------------------------+
   134 rows in set. Query took 0.015 seconds.
   ```
   
   ```
   ❯ select distinct stop_name from stop_parquet;
   +------------------------------+
   | stop_name                    |
   +------------------------------+
   | Milenijna                    |
   | Park Praski                  |
   | Wolności                     |
   ...
   | Dzika                        |
   | Budowlana                    |
   | PKP Płudy                    |
   | Bolesławicka                 |
   | Marcelin                     |
   +------------------------------+
   112 rows in set. Query took 0.008 seconds.
   ```
   
   i.e. even without join, we can tell that parquet and csv reads differently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org