You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Nathan <na...@gmail.com> on 2020/05/07 08:41:36 UTC

Error reading Parquet file from Azure Blob Storage using Apache Drill

Hey there,

I trust you are well.  I’m currently working on a POC to connect our end-user application to Azure Blob Storage.  I’ve been experimenting with using Apache Drill to connect to Blob storage and read a Parquet file.  I've added the .jar files for azure-storage-8.6.3.jar and hadoop-azure-3.2.1.jar to my installation (I’ve also tried the combinations of jar files suggested here (https://drill.apache.org/docs/azure-blob-storage-plugin/)

I'm able to read a JSON file stored in Blob storage (see first screenshot below), however, when I try to read the Parquet file I get the following error:

ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: SELECT * FROM az.default../CLTYP/CLTYP_2020_04_29_09_57.parquet LIMIT 100 [30038]Query execution error. Details:[ SYSTEM ERROR: StorageException: The requested operation is not allowed in the current state of the entity.
Please, refer to logs for more information.

I then downloaded the Parquet file to my laptop and was able to explore it without any issues (see second screenshot below).

I'm new to Drill and not sure how to proceed? I'm not sure why the JSON reads work while the Parquet doesn't? Spent some time searching for the specific error I'm seeing but without any luck. Any assistance on this would be greatly appreciated.

I'm running: Apache Drill 1.17.0 on Windows 10 with MapR Drill ODBC Driver version: 1.3.22.1055

JSON File read from BLOB storage – No Error



Parquet read when first stored to disk and not read directly form storage – No error