You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "einsone (via GitHub)" <gi...@apache.org> on 2023/05/04 06:21:50 UTC
[GitHub] [arrow] einsone opened a new issue, #35424: pyarrow.lib.ArrowInvalid: Schema at index 1 was different
einsone opened a new issue, #35424:
URL: https://github.com/apache/arrow/issues/35424
### Describe the usage question you have. Please include as many useful details as possible.
why different columns order result in different schema?
the following code raise:
pyarrow.lib.ArrowInvalid: Schema at index 1 was different:
```python
import pandas as pd
import pyarrow as pa
print(pa.show_info())
df1 = pd.DataFrame({
"col1": [1,2,3,4,5],
"col2": ["A", "B", "C", "D", "E"],
})
df2 = pd.DataFrame({
"col2": ["A", "B", "C", "D", "E"],
"col1": [1,2,3,4,5],
})
tbl1 = pa.Table.from_pandas(df1, preserve_index=False)
tbl2 = pa.Table.from_pandas(df2, preserve_index=False)
tbl3 = pa.concat_tables([tbl1, tbl2])
```
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on issue #35424: [Python] pyarrow.concat_tables raises error about different Schema if columns have different order
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35424:
URL: https://github.com/apache/arrow/issues/35424#issuecomment-1538967610
AFAIK, there is no way for Arrow to consistently determine the correct order. In Arrow, columns are allowed to have duplicate names so something like this would be allowed:
```
tab1 = pa.Table.from_pydict({
"col": [1,2,3,4,5],
"col": [6, 7, 8, 9, 10],
})
tab2 = pa.Table.from_pydict({
"col": [6, 7, 8, 9,10],
"col": [1,2,3,4,5],
})
```
Two tables with different schemas can't be combined. You will need to normalize the schema in your code (or perhaps pandas) before providing it to Arrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] einsone commented on issue #35424: pyarrow.lib.ArrowInvalid: Schema at index 1 was different
Posted by "einsone (via GitHub)" <gi...@apache.org>.
einsone commented on issue #35424:
URL: https://github.com/apache/arrow/issues/35424#issuecomment-1534151357
pyarrow version info
--------------------
Package kind : python-wheel-manylinux2014
Arrow C++ library version : 12.0.0
Arrow C++ compiler : GNU 10.2.1
Arrow C++ compiler flags : -fdiagnostics-color=always
Arrow C++ git revision :
Arrow C++ git description :
Arrow C++ build type : release
Platform:
OS / Arch : Linux x86_64
SIMD Level : avx2
Detected SIMD Level : avx2
Memory:
Default backend : jemalloc
Bytes allocated : 0 bytes
Max memory : 0 bytes
Supported Backends : jemalloc, mimalloc, system
Optional modules:
csv : Enabled
cuda : -
dataset : Enabled
feather : Enabled
flight : Enabled
fs : Enabled
gandiva : -
json : Enabled
orc : Enabled
parquet : Enabled
Filesystems:
GcsFileSystem : Enabled
HadoopFileSystem : Enabled
S3FileSystem : Enabled
Compression Codecs:
brotli : Enabled
bz2 : Enabled
gzip : Enabled
lz4_frame : Enabled
lz4 : Enabled
snappy : Enabled
zstd : Enabled
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org