You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/18 15:36:55 UTC

[GitHub] [arrow-rs] mohitreddy1996 opened a new issue, #3136: arrow to and from pyarrow conversion results in changes in schema

mohitreddy1996 opened a new issue, #3136:
URL: https://github.com/apache/arrow-rs/issues/3136

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   Converting a RecordBatch to pyarrow RecordBatch and converting it back to rust RecordBatch results in inconsistent schema. 
   
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   ```
   #[pyfunction]
   fn lookup(py: Python<'_>, keys: PyObject) -> PyResult<PyObject> {
       // Input is Arrow RecordBatch
       let keys = RecordBatch::from_pyarrow(keys.as_ref(py))?;
       println!("keys: {:?}", keys);
       keys.to_pyarrow(py)
   }
   
   #[test]
       fn test_conversion() {
           let a: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
           let b: ArrayRef = Arc::new(StringArray::from(vec!["a", "b"]));
           let input = RecordBatch::try_from_iter(vec![("a", a), ("b", b)]).unwrap();
           println!("input: {:?}", input);
   
           let res = pyo3::Python::with_gil(|py| {
               let x = lookup(py, input.to_pyarrow(py).unwrap()).unwrap();
               RecordBatch::from_pyarrow(x.as_ref(py)).unwrap()
           });
   
          assert_eq!(input, res);
   }
          
   ```
   
   output - 
   
   ```
   input: RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "b", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int32>
   [
     1,
     2,
   ], StringArray
   [
     "a",
     "b",
   ]], row_count: 2 }
   
   keys: RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "b", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int32>
   [
     1,
     2,
   ], StringArray
   [
     "a",
     "b",
   ]], row_count: 2 }
   ```
   
   `nullable: false` is what is different b/w the types.
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   
   Versions of the packages used - 
   
   ```
   arrow = { version = "25.0.0", features = ["pyarrow"] }
   pyo3 = { version = "0.17.1" }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] doki23 commented on issue #3136: arrow to and from pyarrow conversion results in changes in schema

Posted by GitBox <gi...@apache.org>.
doki23 commented on issue #3136:
URL: https://github.com/apache/arrow-rs/issues/3136#issuecomment-1326964690

   Yes, we may pass the schema of RecordBatch to the func `from_arrays` of pyarrow's `RecordBatch`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3136: arrow to and from pyarrow conversion results in changes in schema

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #3136: arrow to and from pyarrow conversion results in changes in schema
URL: https://github.com/apache/arrow-rs/issues/3136


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org