You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "einsone (via GitHub)" <gi...@apache.org> on 2023/05/04 06:21:50 UTC

[GitHub] [arrow] einsone opened a new issue, #35424: pyarrow.lib.ArrowInvalid: Schema at index 1 was different

einsone opened a new issue, #35424:
URL: https://github.com/apache/arrow/issues/35424

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   why different columns order result in different schema?
   
   the following code raise:
   
   pyarrow.lib.ArrowInvalid: Schema at index 1 was different:
   
   ```python
   import pandas as pd
   import pyarrow as pa
   
   print(pa.show_info())
   
   df1 = pd.DataFrame({
       "col1": [1,2,3,4,5],
       "col2": ["A", "B", "C", "D", "E"],
   })
   
   df2 = pd.DataFrame({
       "col2": ["A", "B", "C", "D", "E"],
       "col1": [1,2,3,4,5],
   })
   
   
   tbl1 = pa.Table.from_pandas(df1, preserve_index=False)
   tbl2 = pa.Table.from_pandas(df2, preserve_index=False)
   
   tbl3 = pa.concat_tables([tbl1, tbl2])
   ```
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #35424: [Python] pyarrow.concat_tables raises error about different Schema if columns have different order

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on issue #35424:
URL: https://github.com/apache/arrow/issues/35424#issuecomment-1538967610

   AFAIK, there is no way for Arrow to consistently determine the correct order.  In Arrow, columns are allowed to have duplicate names so something like this would be allowed:
   
   ```
   tab1 = pa.Table.from_pydict({
       "col": [1,2,3,4,5],
       "col": [6, 7, 8, 9, 10],
   })
   
   tab2 = pa.Table.from_pydict({
       "col": [6, 7, 8, 9,10],
       "col": [1,2,3,4,5],
   })
   ```
   
   Two tables with different schemas can't be combined.  You will need to normalize the schema in your code (or perhaps pandas) before providing it to Arrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] einsone commented on issue #35424: pyarrow.lib.ArrowInvalid: Schema at index 1 was different

Posted by "einsone (via GitHub)" <gi...@apache.org>.

einsone commented on issue #35424:
URL: https://github.com/apache/arrow/issues/35424#issuecomment-1534151357

   pyarrow version info
   --------------------
   Package kind              : python-wheel-manylinux2014
   Arrow C++ library version : 12.0.0  
   Arrow C++ compiler        : GNU 10.2.1
   Arrow C++ compiler flags  :  -fdiagnostics-color=always
   Arrow C++ git revision    :         
   Arrow C++ git description :         
   Arrow C++ build type      : release 
   
   Platform:
     OS / Arch           : Linux x86_64
     SIMD Level          : avx2    
     Detected SIMD Level : avx2    
   
   Memory:
     Default backend     : jemalloc
     Bytes allocated     : 0 bytes 
     Max memory          : 0 bytes 
     Supported Backends  : jemalloc, mimalloc, system
   
   Optional modules:
     csv                 : Enabled 
     cuda                : -       
     dataset             : Enabled 
     feather             : Enabled 
     flight              : Enabled 
     fs                  : Enabled 
     gandiva             : -       
     json                : Enabled 
     orc                 : Enabled 
     parquet             : Enabled 
   
   Filesystems:
     GcsFileSystem       : Enabled 
     HadoopFileSystem    : Enabled 
     S3FileSystem        : Enabled 
   
   Compression Codecs:
     brotli              : Enabled 
     bz2                 : Enabled 
     gzip                : Enabled 
     lz4_frame           : Enabled 
     lz4                 : Enabled 
     snappy              : Enabled 
     zstd                : Enabled


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org