You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2020/10/09 04:11:00 UTC

[jira] [Commented] (ARROW-10242) Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel")

    [ https://issues.apache.org/jira/browse/ARROW-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210600#comment-17210600 ] 

Andy Grove commented on ARROW-10242:
------------------------------------

Hi [~joshx]  and thanks for the bug report. I was unable to reproduce the issue on any of the parquet data sets that I usually test with, but they are simple data sets containing primitive types. My first guess here is that there is something in the files that DataFusion doesn't support and the error message is being suppressed, but this is just a guess. Do your files contain nested types?

 

Do you see any other errors before the disconnected channel error?

> Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel")
> --------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-10242
>                 URL: https://issues.apache.org/jira/browse/ARROW-10242
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>    Affects Versions: 2.0.0
>            Reporter: Josh Taylor
>            Assignee: Andy Grove
>            Priority: Major
>
> *Running the latest code from github for datafusion & parquet.*
> When trying to read a directory of around ~210 parquet files (3.2gb total, each file around 13-18mb), doing the following:
> {code:java}
> let mut ctx = ExecutionContext::new();
> // register parquet file with the execution context
> ctx.register_parquet(
>  "something",
>  "/home/josh/dev/pat/fff/"
> )?;
> // execute the query
> let df = ctx.sql(
>  "select * from something",
> )?;
> let results = df.collect().await?;
>  
> {code}
> I get the following error shown ~204 times:
> {code:java}
> Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel"){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)