You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "RONIERI MARQUES RAMALHO (Jira)" <ji...@apache.org> on 2020/02/22 00:23:00 UTC

[jira] [Commented] (DRILL-7512) Parquet Reading or Writing does not work with ADLS Gen 2

    [ https://issues.apache.org/jira/browse/DRILL-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042285#comment-17042285 ] 

RONIERI MARQUES RAMALHO commented on DRILL-7512:
------------------------------------------------

Don't know if it helps but, I have a very similar problem using just Azure apps (ADF and Azcopy) to read from Azure Data Lake Storage Gen 2. So I don't think Apache Drill is the one to blame

I opened a case on Microsoft, and I'm waiting for a response, until now we believe there's something corrupted on the data files.

The workarround was, to read those files using Databricks pyspark dataframes, and then rewriting data on the Data Lake again

But as this "corruption" happened twice in a 4 months interval, I'll wait for the investigation to find the cause, in hopes to avoid it

 

Regards

> Parquet Reading or Writing does not work with ADLS Gen 2
> --------------------------------------------------------
>
>                 Key: DRILL-7512
>                 URL: https://issues.apache.org/jira/browse/DRILL-7512
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.16.0, 1.17.0
>            Reporter: Greg Shomette
>            Priority: Minor
>
> I can query delimited files in ADLS Gen 2 using wasb blob storage plugin, I can show files and see parquet files but cannot read them or write them.
> I can use a DFS plugin to read and write parquet locally but not with Gen 2. ADLS Gen 1 works fine for reading and writing.
> I have tried two version of Drill and also the recommended jar files as well as older versions and still no luck.
> Does Drill support data lake gen 2 with parquet files. 
> This query creates the following error.
> | |CREATE TABLE az.tmp.sampleparquet AS (SELECT * FROM az.`/Conformed/DimGeo.psv`)|
> {color:#333333} (java.lang.RuntimeException) java.lang.NoSuchMethodError: com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(Ljava/net/URI;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/blob/BlobRequestOptions;Lcom/microsoft/azure/storage/OperationContext;)Ljava/lang/String;
>     org.apache.drill.common.DeferredException.addThrowable():101
>     org.apache.drill.exec.work.fragment.FragmentExecutor.fail():475
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():317
>     org.apache.drill.common.SelfCleaningRunnable.run():38{color}
>  
> {color:#333333}this query create the following error.{color}
> {color:#333333}{color:#001000}select * from az.`region.parquet`{color}{color}
> {color:#333333}SYSTEM ERROR: StorageException: The requested operation is not allowed in the current state of the entity.
> Please, refer to logs for more information.
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: Error while applying rule DrillScanRule, args [rel#44:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[az, region.parquet])]
>     org.apache.drill.exec.work.foreman.Foreman.run():302
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)