You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Anton Waniek (Jira)" <ji...@apache.org> on 2021/05/14 11:54:00 UTC

[jira] [Updated] (DRILL-7927) NullPointerException when trying to write UNIONTYPE to Parquet

     [ https://issues.apache.org/jira/browse/DRILL-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Waniek updated DRILL-7927:
--------------------------------
    Attachment:     (was: drillErr.png)

> NullPointerException when trying to write UNIONTYPE to Parquet
> --------------------------------------------------------------
>
>                 Key: DRILL-7927
>                 URL: https://issues.apache.org/jira/browse/DRILL-7927
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>         Environment: *Docker:*
> Client: Docker Engine - Community
>  Cloud integration: 1.0.14
>  Version: 20.10.6
>  API version: 1.41
>  Go version: go1.13.15
>  Git commit: 370c289
>  Built: Fri Apr 9 22:46:45 2021
>  OS/Arch: linux/amd64
>  Context: default
>  Experimental: true
> Server: Docker Engine - Community
>  Engine:
>  Version: 20.10.6
>  API version: 1.41 (minimum version 1.12)
>  Go version: go1.13.15
>  Git commit: 8728dd2
>  Built: Fri Apr 9 22:44:56 2021
>  OS/Arch: linux/amd64
>  Experimental: false
>  containerd:
>  Version: 1.4.4
>  GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
>  runc:
>  Version: 1.0.0-rc93
>  GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
>  docker-init:
>  Version: 0.19.0
>  GitCommit: de40ad0
> Running on Windows under WSL2.
>            Reporter: Anton Waniek
>            Priority: Minor
>
> The "union type" data type is not supported by the Parquet format and thus, Drill should handle the exception that occurs when the user attempts to write this type to parquet. A NullPointerException is currently thrown.
> There are a few steps necessary to reproduce this bug but the process is straightforward.
> To summarize the commands in advance: to have a table with columns using the UNION type, one must first enable the union type option, then run a query over a MongoDB collection with inhomogeneous types (e.g. strings and numbers) (*n.b.* there may be a simpler way to get hold of a union type table but I am not aware of it). One must then try to write the table to parquet. 
> First start MongoDB and store appropriate data inside:
> {code:bash}
> docker run --rm -it -d --name mongo-uniontype mongo:4.4
> # wait for mongo a bit
> sleep 1
> create_coll='db.uniontype_table.insertMany([{"column": 1},{"column": "string"}])'
> docker exec -it mongo-uniontype mongo example --eval $create_coll
> # check the outcome
> docker exec -it mongo-uniontype mongo example --eval 'db.uniontype_table.find()'{code}
> Run Drill and configure the Mongo storage plugin:
> {code:bash}
> docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
>   apache/drill:latest /bin/bash
> mongo_ip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
> mongo_conf() {
> cat <<EOF
> {
>   "name": "mongo",
>   "config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/", "enabled":"true"}
> }
> EOF
> }
> sleep 5  # wait a little for Drill
> curl -X POST -H "Content-Type: application/json" \
>   http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
> Finally, attach to the freshly configured Drill, set the relevant option, and run the query:
> {code:bash}
> docker attach drill-uniontype
> {code}
> then in the resulting *sqlline* command line:
> {code:java}
> use mongo.example;
> SET `exec.enable_union_type` = true;
> CREATE TABLE `dfs.tmp`.`problem_is_here.parquet` AS (SELECT * FROM `uniontype_table`);{code}
> And the last statement should raise the exception.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)