You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/13 14:57:46 UTC

[GitHub] [arrow-datafusion] alamb edited a comment on issue #1441: Incorrect results in datafusion

alamb edited a comment on issue #1441:
URL: https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-1038174703


   I reran the queries from @franeklubi at 2e918184c502a40a041fb0163702cb6ab8de0af9:
   
   # Setup
   ```sql
   CREATE EXTERNAL TABLE stop (time TEXT, trip_tid TEXT, trip_line TEXT, stop_name TEXT) STORED AS CSV WITH HEADER ROW LOCATION '/Users/alamb/Downloads/issue_data/csvs/stop.csv';
   CREATE EXTERNAL TABLE trip (tid TEXT, line TEXT, base_day TEXT) STORED AS CSV WITH HEADER ROW LOCATION '/Users/alamb/Downloads/issue_data/csvs/trip.csv';
   
   CREATE EXTERNAL TABLE stop_parquet STORED AS PARQUET LOCATION '/Users/alamb/Downloads/issue_data/parquets/stops/';
   CREATE EXTERNAL TABLE trip_parquet STORED AS PARQUET LOCATION '/Users/alamb/Downloads/issue_data/parquets/trips/';
   ```
   
   Now the results are consistent when using csv and parquet:
   
   ## Parquet
   ```sql
   ❯ SELECT DISTINCT stop_name FROM stop_parquet INNER JOIN trip_parquet ON tid = trip_tid WHERE line = '176' ORDER BY stop_name NULLS LAST;
   +----------------------+
   | stop_name            |
   +----------------------+
   | Bartnicza            |
   | Bazyliańska          |
   | Bolesławicka         |
   | Brzezińska           |
   | Budowlana            |
   | Choszczówka          |
   | Chłodnia             |
   | Daniszewska          |
   | Fabryka Pomp         |
   | Insurekcji           |
   | Marcelin             |
   | Marywilska-Las       |
   | Ołówkowa             |
   | PKP Płudy            |
   | PKP Żerań            |
   | Parowozowa           |
   | Pelcowizna           |
   | Polnych Kwiatów      |
   | Raciborska           |
   | Rembielińska         |
   | Sadkowska            |
   | Smugowa              |
   | Starego Dębu         |
   | Zyndrama z Maszkowic |
   | os.Marywilska        |
   | Śpiewaków            |
   |                      |
   +----------------------+
   27 rows in set. Query took 0.042 seconds.
   ```
   
   ## Csv
   
   ```sql
   ❯ SELECT DISTINCT stop_name FROM stop INNER JOIN trip ON tid = trip_tid WHERE line = '176' ORDER BY stop_name NULLS LAST;
   +----------------------+
   | stop_name            |
   +----------------------+
   |                      |
   | Bartnicza            |
   | Bazyliańska          |
   | Bolesławicka         |
   | Brzezińska           |
   | Budowlana            |
   | Choszczówka          |
   | Chłodnia             |
   | Daniszewska          |
   | Fabryka Pomp         |
   | Insurekcji           |
   | Marcelin             |
   | Marywilska-Las       |
   | Ołówkowa             |
   | PKP Płudy            |
   | PKP Żerań            |
   | Parowozowa           |
   | Pelcowizna           |
   | Polnych Kwiatów      |
   | Raciborska           |
   | Rembielińska         |
   | Sadkowska            |
   | Smugowa              |
   | Starego Dębu         |
   | Zyndrama z Maszkowic |
   | os.Marywilska        |
   | Śpiewaków            |
   +----------------------+
   27 rows in set. Query took 0.116 seconds.
   ```
   
   Interestingly, the CSV results don't seem to have the `NULLS LAST` 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org