You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Anton Waniek (Jira)" <ji...@apache.org> on 2021/05/14 11:02:00 UTC

[jira] [Created] (DRILL-7927) NullPointerException when trying to write UNIONTYPE to Parquet

Anton Waniek created DRILL-7927:
-----------------------------------

             Summary: NullPointerException when trying to write UNIONTYPE to Parquet
                 Key: DRILL-7927
                 URL: https://issues.apache.org/jira/browse/DRILL-7927
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.18.0
         Environment: *Docker:*
Client: Docker Engine - Community
 Cloud integration: 1.0.14
 Version: 20.10.6
 API version: 1.41
 Go version: go1.13.15
 Git commit: 370c289
 Built: Fri Apr 9 22:46:45 2021
 OS/Arch: linux/amd64
 Context: default
 Experimental: true

Server: Docker Engine - Community
 Engine:
 Version: 20.10.6
 API version: 1.41 (minimum version 1.12)
 Go version: go1.13.15
 Git commit: 8728dd2
 Built: Fri Apr 9 22:44:56 2021
 OS/Arch: linux/amd64
 Experimental: false
 containerd:
 Version: 1.4.4
 GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
 Version: 1.0.0-rc93
 GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
 Version: 0.19.0
 GitCommit: de40ad0

Running on Windows under WSL2.
            Reporter: Anton Waniek


The "union type" data type is not supported by the Parquet format and thus, Drill should handle the exception that occurs when the user attempts to write this type to parquet. A NullPointerException is currently thrown.

There are a few steps necessary to reproduce this bug but the process is straightforward.

To summarize the commands in advance: to have a table with columns using the UNION type, one must first enable the union type option, then run a query over a MongoDB collection with inhomogeneous types (e.g. strings and numbers) (*n.b.* there may be a simpler way to get hold of a union type table but I am not aware of it). One must then try to write the table to parquet. 

First start MongoDB and store appropriate data inside:
{code:bash}
docker run --rm -it -d --name mongo-uniontype mongo:4.4
# wait for mongo a bit
sleep 1
create_coll='db.uniontype_table.insertMany([{"column": 1},{"column": "string"}])'
docker exec -it mongo-uniontype mongo example --eval $create_coll

# check the outcome
docker exec -it mongo-uniontype mongo example --eval 'db.uniontype_table.find()'{code}
Run Drill and configure the Mongo storage plugin:
{code:bash}
docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
  apache/drill:latest /bin/bash
mongo_ip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
mongo_conf() {
cat <<EOF
{
  "name": "mongo",
  "config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/", "enabled":"true"}
}
EOF
}
sleep 5  # wait a little for Drill
curl -X POST -H "Content-Type: application/json" \
  http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
Finally, attach to the freshly configured Drill, set the relevant option, and run the query:
{code:bash}
docker attach drill-uniontype
{code}
then in the resulting *sqlline* command line:
{code:java}
use mongo.example;
SET `exec.enable_union_type` = true;
CREATE TABLE `dfs.tmp`.`problem_is_here.parquet` AS (SELECT * FROM `uniontype_table`);{code}
And the last statement should raise the exception.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)