You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/27 17:05:31 UTC

[GitHub] [arrow-rs] oersted opened a new issue, #2952: Subtle compatibility issue with serve_arrow

oersted opened a new issue, #2952:
URL: https://github.com/apache/arrow-rs/issues/2952

   **Describe the bug**
   
   `serde_arrow::from_record_batch` expects `arrow::record_batch::RecordBatch` and `arrow::schema::Schema` as inputs, while `ParquetRecordBatchStream` seems to yield `arrow_array::record_batch::RecordBatch` and similarly `::schema()` returns `arrow_schema::schema::Schema`.
   
   As far as I can tell, the implementation of the types are exactly the same but the compiler considers them different types.
   
   **To Reproduce**
   ```
   #[derive(Deserializable)]
   struct S {}
   
   let file = File::open(path.as_ref()).await?;
   let mut batches = ParquetRecordBatchStreamBuilder::new(file).await?
       .with_batch_size(1024)  // Default
       .build()?;
   let schema = batches.schema().as_ref();
   
   while let Some(batch) = batches.next().await {
       let batch = batch?;
       from_record_batch::<S>(&batch, schema);
   }
   ```
   
   ```
   error[E0308]: arguments to this function are incorrect
     --> src/sign.rs:28:17
      |
   28 |                 from_record_batch::<S>(&batch, schema);
      |                 ^^^^^^^^^^^^^^^^^^^^^^^^^ ------  ------ expected struct `Schema`, found struct `arrow_schema::schema::Schema`
      |                                           |
      |                                           expected struct `arrow::record_batch::RecordBatch`, found struct `arrow_array::record_batch::RecordBatch`
      |
      = note: expected reference `&arrow::record_batch::RecordBatch`
                 found reference `&arrow_array::record_batch::RecordBatch`
      = note: expected reference `&Schema`
                 found reference `&arrow_schema::schema::Schema`
   ```
   
   **Expected behavior**
   
   I would expect the API to return the types exposed by the main `arrow` namespace rather than `arrow_array` and `arrow_schema`. At least it should be possible to convert between them with `From`.
   
   Because of this, as far as I can tell, it isn't possible (or at least ergonomic) to use `serde` to deserialise from `parquet` directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] oersted commented on issue #2952: Subtle compatibility issue with serve_arrow

Posted by GitBox <gi...@apache.org>.
oersted commented on issue #2952:
URL: https://github.com/apache/arrow-rs/issues/2952#issuecomment-1295068850

   Thanks, I didn't consider that. Here you go, is it how you would expect it?
   ```
   │   ├── arrow v25.0.0
   │   │   ├── arrow-array v25.0.0
   │   │   │   ├── arrow-buffer v25.0.0
   │   │   │   ├── arrow-data v25.0.0
   │   │   │   │   ├── arrow-buffer v25.0.0 (*)
   │   │   │   │   ├── arrow-schema v25.0.0
   │   │   │   ├── arrow-schema v25.0.0
   │   │   ├── arrow-buffer v25.0.0 (*)
   │   │   ├── arrow-data v25.0.0 (*)
   │   │   ├── arrow-schema v25.0.0
   ├── serde_arrow v0.5.0
   │   ├── arrow v16.0.0
   ```
   
   Have you managed to replicate the issue? Does this work fine for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #2952: Subtle compatibility issue with serve_arrow

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2952:
URL: https://github.com/apache/arrow-rs/issues/2952#issuecomment-1293903593

   Could you run `cargo tree | grep arrow` for your crate. This sounds like you might have two versions of the crate in your workspace?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] oersted commented on issue #2952: Subtle compatibility issue with serve_arrow

Posted by GitBox <gi...@apache.org>.
oersted commented on issue #2952:
URL: https://github.com/apache/arrow-rs/issues/2952#issuecomment-1295090237

   Nevermind, it seems like `Schema` is a different type implemented by `serde_arrow`, which does implement `From<arrow::datatypes::Schema>` as well as `fn from_record_batch(record_batch: &RecordBatch) -> Result<Self>`, so that's fine.
   
   I'll close this, but I'll open another issue at `serde_arrow` so that they at least update their dependency to the most recent version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] oersted closed issue #2952: Subtle compatibility issue with serve_arrow

Posted by GitBox <gi...@apache.org>.
oersted closed issue #2952: Subtle compatibility issue with serve_arrow
URL: https://github.com/apache/arrow-rs/issues/2952


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] oersted commented on issue #2952: Subtle compatibility issue with serve_arrow

Posted by GitBox <gi...@apache.org>.
oersted commented on issue #2952:
URL: https://github.com/apache/arrow-rs/issues/2952#issuecomment-1295079651

   I restricted `parquet = "16.0"` since `serde_arrow` requires that `arrow` version. It seemed to improve the situation, but I still have this issue.
   
   ```
   error[E0308]: mismatched types
     --> src/sign.rs:28:51
      |
   28 |                 from_record_batch::<Sign>(&batch, schema);
      |                 -------------------------         ^^^^^^ expected struct `Schema`, found struct `arrow::datatypes::schema::Schema`
      |                 |
      |                 arguments to this function are incorrect
      |
      = note: expected reference `&Schema`
                 found reference `&arrow::datatypes::schema::Schema`
   ```
   
   ```
   │   ├── arrow v16.0.0
   ├── serde_arrow v0.5.0
   │   ├── arrow v16.0.0 (*)
   ```
   
   Nevertheless, it shouldn't be necessary to do this adjustment by hand. Can you think of a way to enforce this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org