You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/11 19:58:15 UTC

[GitHub] [arrow-datafusion] Miyake-Diogo opened a new issue, #3800: [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error?

Miyake-Diogo opened a new issue, #3800:
URL: https://github.com/apache/arrow-datafusion/issues/3800

   When I Try to read a CSV and Write as Parquet, compiler raises this errror: Error: Execution("file size of 4 is less than footer").
   I didn't found anything on docs and internetm, about it ... 
   My code: 
   ``` rust
   use datafusion::prelude::*;
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
     let ctx: SessionContext = SessionContext::new();
     let raw_covid_path: &str = "data/minilake/raw/covid/owid-covid-data.csv";
     
     let covid_df = ctx.read_csv(raw_covid_path,  
                                     CsvReadOptions::new()).await?;
     covid_df.write_parquet(stage_covid_path,None).await?;
     Ok(())
     }
   
   ```
   Covid Dataset is located in this [link](https://ourworldindata.org/explorers/coronavirus-data-explorer?facet=none&Metric=Confirmed+deaths&Interval=Cumulative+%28last+12+months%29&Relative+to+Population=true&Color+by+test+positivity=false)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #3800: [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error?

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #3800:
URL: https://github.com/apache/arrow-datafusion/issues/3800#issuecomment-1275222538

   On the null issue, I think schema inference should default to assuming everything is nullable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed issue #3800: [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error?

Posted by GitBox <gi...@apache.org>.
andygrove closed issue #3800: [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error?
URL: https://github.com/apache/arrow-datafusion/issues/3800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #3800: [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error?

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #3800:
URL: https://github.com/apache/arrow-datafusion/issues/3800#issuecomment-1275274890

   @Miyake-Diogo as with https://github.com/apache/arrow-datafusion/issues/3783, if you specify the schema rather than inferring the schema you will get past this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #3800: [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error?

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #3800:
URL: https://github.com/apache/arrow-datafusion/issues/3800#issuecomment-1275234819

   > On the null issue, I think schema inference should default to assuming everything is nullable?
   
   That would seem safer to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #3800: [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error?

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #3800:
URL: https://github.com/apache/arrow-datafusion/issues/3800#issuecomment-1275214834

   The plot thickens... I ran this unit test in df:
   
   ```
   #[tokio::test]
   async fn main() -> datafusion::error::Result<()> {
       let ctx: SessionContext = SessionContext::new();
       let raw_covid_path: &str = "/mnt/bigdata/covid/owid-covid-data.csv";
       let covid_df = ctx.read_csv(raw_covid_path,
                                   CsvReadOptions::new()).await?;
       let batches = covid_df.collect().await?;
       for batch in &batches {
           println!("{}", batch.num_rows());
       }
       Ok(())
   }
   ```
   
   fails with:
   
   ```
   Error: ArrowError(InvalidArgumentError("Column 'population' is declared as non-nullable but contains null values"))
   ```
   
   so it appears we have 2 separate issues:
   
   - schema inference on partial scan says field is non-null but there are nulls .. maybe that is fine
   - errors seem to be consumed and ignored when reading csv and writing to another format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org