You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/26 06:12:36 UTC

[GitHub] [arrow-datafusion] matthewmturner commented on issue #1488: Float column type inferred as utf8 when reading csv

matthewmturner commented on issue #1488:
URL: https://github.com/apache/arrow-datafusion/issues/1488#issuecomment-1001111740


   Interestingly, sometimes it seems to work.  with one of the tables i get the expected results
   
   ```
   ❯ CREATE EXTERNAL TABLE x STORED AS CSV WITH HEADER ROW LOCATION "data/J1_1e7_NA_0_0.csv";
   0 rows in set. Query took 5.536 seconds.
   ❯ SHOW COLUMNS FROM x;
   +---------------+--------------+------------+-------------+-----------+-------------+
   | table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
   +---------------+--------------+------------+-------------+-----------+-------------+
   | datafusion    | public       | x          | id1         | Int64     | NO          |
   | datafusion    | public       | x          | id2         | Int64     | NO          |
   | datafusion    | public       | x          | id3         | Int64     | NO          |
   | datafusion    | public       | x          | id4         | Utf8      | NO          |
   | datafusion    | public       | x          | id5         | Utf8      | NO          |
   | datafusion    | public       | x          | id6         | Utf8      | NO          |
   | datafusion    | public       | x          | v1          | Utf8      | NO          |
   +---------------+--------------+------------+-------------+-----------+-------------+
   7 rows in set. Query took 0.002 seconds.
   ❯ CREATE EXTERNAL TABLE large STORED AS CSV WITH HEADER ROW LOCATION "data/J1_1e7_1e7_0_0.csv";
   0 rows in set. Query took 5.410 seconds.
   ❯ SHOW COLUMNS FROM large;
   +---------------+--------------+------------+-------------+-----------+-------------+
   | table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
   +---------------+--------------+------------+-------------+-----------+-------------+
   | datafusion    | public       | large      | id1         | Int64     | NO          |
   | datafusion    | public       | large      | id2         | Int64     | NO          |
   | datafusion    | public       | large      | id3         | Int64     | NO          |
   | datafusion    | public       | large      | id4         | Utf8      | NO          |
   | datafusion    | public       | large      | id5         | Utf8      | NO          |
   | datafusion    | public       | large      | id6         | Utf8      | NO          |
   | datafusion    | public       | large      | v2          | Utf8      | NO          |
   +---------------+--------------+------------+-------------+-----------+-------------+
   7 rows in set. Query took 0.003 seconds.
   ❯ CREATE EXTERNAL TABLE medium STORED AS CSV WITH HEADER ROW LOCATION "data/J1_1e7_1e4_0_0.csv";
   0 rows in set. Query took 0.010 seconds.
   ❯ SHOW COLUMNS FROM medium;
   +---------------+--------------+------------+-------------+-----------+-------------+
   | table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
   +---------------+--------------+------------+-------------+-----------+-------------+
   | datafusion    | public       | medium     | id1         | Int64     | NO          |
   | datafusion    | public       | medium     | id2         | Int64     | NO          |
   | datafusion    | public       | medium     | id4         | Utf8      | NO          |
   | datafusion    | public       | medium     | id5         | Utf8      | NO          |
   | datafusion    | public       | medium     | v2          | Float64   | NO          |
   +---------------+--------------+------------+-------------+-----------+-------------+
   5 rows in set. Query took 0.002 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org