You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Anton Waniek (Jira)" <ji...@apache.org> on 2021/05/14 11:02:00 UTC
[jira] [Created] (DRILL-7927) NullPointerException when trying to
write UNIONTYPE to Parquet
Anton Waniek created DRILL-7927:
-----------------------------------
Summary: NullPointerException when trying to write UNIONTYPE to Parquet
Key: DRILL-7927
URL: https://issues.apache.org/jira/browse/DRILL-7927
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.18.0
Environment: *Docker:*
Client: Docker Engine - Community
Cloud integration: 1.0.14
Version: 20.10.6
API version: 1.41
Go version: go1.13.15
Git commit: 370c289
Built: Fri Apr 9 22:46:45 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.6
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 8728dd2
Built: Fri Apr 9 22:44:56 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.4
GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc:
Version: 1.0.0-rc93
GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Running on Windows under WSL2.
Reporter: Anton Waniek
The "union type" data type is not supported by the Parquet format and thus, Drill should handle the exception that occurs when the user attempts to write this type to parquet. A NullPointerException is currently thrown.
There are a few steps necessary to reproduce this bug but the process is straightforward.
To summarize the commands in advance: to have a table with columns using the UNION type, one must first enable the union type option, then run a query over a MongoDB collection with inhomogeneous types (e.g. strings and numbers) (*n.b.* there may be a simpler way to get hold of a union type table but I am not aware of it). One must then try to write the table to parquet.
First start MongoDB and store appropriate data inside:
{code:bash}
docker run --rm -it -d --name mongo-uniontype mongo:4.4
# wait for mongo a bit
sleep 1
create_coll='db.uniontype_table.insertMany([{"column": 1},{"column": "string"}])'
docker exec -it mongo-uniontype mongo example --eval $create_coll
# check the outcome
docker exec -it mongo-uniontype mongo example --eval 'db.uniontype_table.find()'{code}
Run Drill and configure the Mongo storage plugin:
{code:bash}
docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
apache/drill:latest /bin/bash
mongo_ip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
mongo_conf() {
cat <<EOF
{
"name": "mongo",
"config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/", "enabled":"true"}
}
EOF
}
sleep 5 # wait a little for Drill
curl -X POST -H "Content-Type: application/json" \
http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
Finally, attach to the freshly configured Drill, set the relevant option, and run the query:
{code:bash}
docker attach drill-uniontype
{code}
then in the resulting *sqlline* command line:
{code:java}
use mongo.example;
SET `exec.enable_union_type` = true;
CREATE TABLE `dfs.tmp`.`problem_is_here.parquet` AS (SELECT * FROM `uniontype_table`);{code}
And the last statement should raise the exception.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)