You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Austin Heyne <ah...@ccri.com> on 2017/03/28 17:15:37 UTC

GetHDFS from Azure Blob

Hi all,

Thanks for all the help you've given me so far. Today I'm trying to pull 
files from an Azure blob store. I've done some reading on this and from 
previous tickets [1] and guides [2] it seems the recommended approach is 
to place the required jars, to use the HDFS Azure protocol, in 
'Additional Classpath Resoures' and the hadoop core-site and hdfs-site 
configs into the 'Hadoop Configuration Resources'. I have my local HDFS 
properly configured to access wasb urls. I'm able to ls, copy to and 
from, etc with out problem. Using the same HDFS config files and trying 
both all the jars in my hadoop-client/lib directory (hdp) and using the 
jars recommend in [1] I'm still seeing the 
"java.lang.IllegalArgumentException: Wrong FS: " error in my NiFi logs 
and am unable to pull files from Azure blob storage.

Interestingly, it seems the processor is spinning up way to fast, the 
errors appear in the log as soon as I start the processor. I'm not sure 
how it could be loading all of those jars that quickly.

Does anyone have any experience with this or recommendations to try?

Thanks,
Austin

[1] https://issues.apache.org/jira/browse/NIFI-1922
[2] 
https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html



Re: GetHDFS from Azure Blob

Posted by Austin Heyne <ah...@ccri.com>.
For the record,

The way we figured out to fix this is to create a new XML file for each 
root level container that we use (tentatively fs.xml). This fs.xml looks 
like the following:

<configuration>
     <property>
       <name>fs.defaultFS</name>
<value>wasb://container@accountName.blob.core.windows.net/</value>
     </property>
</configuration>

We then include the core-site.xml, hdfs-site.xml and fs.xml in the 
'Hadoop Configuration Resources' path ensuring the fs.xml comes last. 
This will overwrite the fs.defaultFS value set in core-site.xml.

Thanks everyone for the help,
Austin

On 03/28/2017 06:11 PM, Austin Heyne wrote:
> Thanks Bryan,
>
> We're only working with one account here but with multiple root level 
> containers. e.g.
>
> wasb://csv@accountName.blob.core.windows.net/
> wasb://xml@accountName.blob.core.windows.net/
> wasb://json@accountName.blob.core.windows.net/
>
> The thing that stands out to me the most is why would the defaultFS 
> need to be set at all if we're always providing complete wasb://... 
> paths? Almost seems like a bug or oversight.
>
> If anyone has any input on how we could work around this please let me 
> know.
>
> Thanks for your help,
> Austin
>
> On 03/28/2017 04:39 PM, Bryan Bende wrote:
>> Austin,
>>
>> I think you are correct that its <containername>@<accountname>, I
>> hadn't looked at this config in a long time and was reading too
>> quickly before :)
>>
>> That would line up with the other property
>> fs.azure.account.key.<accountname>.blob.core.windows.net where you
>> specify the key for that account.
>>
>> I have no idea if this will work, but lets say you had three different
>> WASB file systems, presumably each with their own account name and
>> key, you might be able to define these in core-site.xml:
>>
>>   <property>
>> <name>fs.azure.account.key.ACCOUNT1.blob.core.windows.net</name>
>>        <value>KEY1</value>
>>      </property>
>>
>>   <property>
>> <name>fs.azure.account.key.ACCOUNT2.blob.core.windows.net</name>
>>        <value>KEY2</value>
>>      </property>
>>
>>   <property>
>> <name>fs.azure.account.key.ACCOUNT3.blob.core.windows.net</name>
>>        <value>KEY3</value>
>>      </property>
>>
>> Then in your HDFS processor in NiFi you point at this core-site.xml
>> and use a specific directory like
>> wasb://container@ACCOUNT3.blob.core.windows.net/<path> and I'm hoping
>> it would know how to use the key for ACCOUNT3.
>>
>> Not really sure if that helps your situation.
>>
>> -Bryan
>>
>>
>> On Tue, Mar 28, 2017 at 4:14 PM, Austin Heyne <ah...@ccri.com> wrote:
>>> Bryan,
>>>
>>> So I initially didn't think much of it (assumed it a typo, etc) but 
>>> you've
>>> said that the access url for wasb that you've been using is
>>> wasb://YOUR_USER@YOUR_HOST/. However, this has never worked for us 
>>> and I'm
>>> wondering if we have a difference configuration somewhere. What we 
>>> have to
>>> use is 
>>> wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
>>> which seems to be in line with the Azure blob storage GUI and is 
>>> what is
>>> outlined here [1]. Is there some other way this connector is being 
>>> setup? It
>>> would make much more sense using your access pattern as then each 
>>> container
>>> wouldn't need to have it's own core-site.xml.
>>>
>>> Thanks,
>>> Austin
>>>
>>> [1a]
>>> https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Accessing_wasb_URLs 
>>>
>>> [1b]
>>> https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage 
>>>
>>>
>>>
>>>
>>>
>>> On 03/28/2017 03:55 PM, Bryan Bende wrote:
>>>> Austin,
>>>>
>>>> I believe the default FS is only used when you write to a path that
>>>> doesn't specify the filesystem. Meaning, if you set the directory of
>>>> PutHDFS to /data then it will use the default FS, but if you specify
>>>> wasb://user@wasb2/data then it will go to /data in a different
>>>> filesystem.
>>>>
>>>> The problem here is that I don't see a way to specify different keys
>>>> for each WASB filesystem in the core-site.xml.
>>>>
>>>> Admittedly I have never tried to setup something like this with many
>>>> different filesystems.
>>>>
>>>> -Bryan
>>>>
>>>>
>>>> On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <ah...@ccri.com> wrote:
>>>>> Hi Andre,
>>>>>
>>>>> Yes, I'm aware of that configuration property, it's what I have been
>>>>> using
>>>>> to set the core-site.xml and hdfs-site.xml. For testing this I didn't
>>>>> modify
>>>>> the core-site located in the HADOOP_CONF_DIR but rather copied and
>>>>> modified
>>>>> it and the pointed the processor to the copy. The problem with 
>>>>> this is
>>>>> that
>>>>> we'll end up with a large number of core-site.xml copies that will 
>>>>> all
>>>>> have
>>>>> to be maintained separately. Ideally we'd be able to specify the
>>>>> defaultFS
>>>>> in the processor config or have the processor behave like the hdfs
>>>>> command
>>>>> line tools. The command line tools don't require the defaultFS to 
>>>>> be set
>>>>> to
>>>>> a wasb url in order to use wasb urls.
>>>>>
>>>>> The key idea here is long term maintainability and using Ambari to
>>>>> maintain
>>>>> the configuration. If we need to change any other setting in the
>>>>> core-site.xml we'd have to change it in a bunch of different files
>>>>> manually.
>>>>>
>>>>> Thanks,
>>>>> Austin
>>>>>
>>>>>
>>>>> On 03/28/2017 03:34 PM, Andre wrote:
>>>>>
>>>>> Austin,
>>>>>
>>>>> Perhaps that wasn't explicit but the settings don't need to be system
>>>>> wide,
>>>>> instead the defaultFS may be changed just for a particular processor,
>>>>> while
>>>>> the others may use configurations.
>>>>>
>>>>> The *HDFS processor documentation mentions it allows yout to set
>>>>> particular
>>>>> hadoop configurations:
>>>>>
>>>>> " A file or comma separated list of files which contains the 
>>>>> Hadoop file
>>>>> system configuration. Without this, Hadoop will search the 
>>>>> classpath for
>>>>> a
>>>>> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
>>>>> configuration"
>>>>>
>>>>> Have you tried using this field to point to a file as described by 
>>>>> Bryan?
>>>>>
>>>>> Cheers
>>>>>
>>>>> On 29 Mar 2017 05:21, "Austin Heyne" <ah...@ccri.com> wrote:
>>>>>
>>>>> Thanks Bryan,
>>>>>
>>>>> Working with the configuration you sent what I needed to change 
>>>>> was to
>>>>> set
>>>>> the fs.defaultFS to the wasb url that we're working from. 
>>>>> Unfortunately
>>>>> this
>>>>> is a less than ideal solution since we'll be pulling files from 
>>>>> multiple
>>>>> wasb urls and ingesting them into an Accumulo datastore. Changing the
>>>>> defaultFS I'm pretty certainly would mess with our local 
>>>>> HDFS/Accumulo
>>>>> install. In addition we're trying to maintain all of this 
>>>>> configuration
>>>>> with
>>>>> Ambari, which from what I can tell only supports one core-site
>>>>> configuration
>>>>> file.
>>>>>
>>>>> Is the only solution here to maintain multiple core-site.xml files 
>>>>> or is
>>>>> there another way we configure this?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Austin
>>>>>
>>>>>
>>>>>
>>>>> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>>>>> Austin,
>>>>>>
>>>>>> Can you provide the full error message and stacktrace for  the
>>>>>> IllegalArgumentException from nifi-app.log?
>>>>>>
>>>>>> When you start the processor it creates a FileSystem instance 
>>>>>> based on
>>>>>> the config files provided to the processor, which in turn causes all
>>>>>> of the corresponding classes to load.
>>>>>>
>>>>>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>>>>>> then I have successfully done the following...
>>>>>>
>>>>>> In core-site.xml:
>>>>>>
>>>>>> <configuration>
>>>>>>
>>>>>>        <property>
>>>>>>          <name>fs.defaultFS</name>
>>>>>> <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>>>>>        </property>
>>>>>>
>>>>>>        <property>
>>>>>> <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>>>>>          <value>YOUR_KEY</value>
>>>>>>        </property>
>>>>>>
>>>>>>        <property>
>>>>>> <name>fs.AbstractFileSystem.wasb.impl</name>
>>>>>> <value>org.apache.hadoop.fs.azure.Wasb</value>
>>>>>>        </property>
>>>>>>
>>>>>>        <property>
>>>>>>          <name>fs.wasb.impl</name>
>>>>>> <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>>>>>        </property>
>>>>>>
>>>>>>        <property>
>>>>>>          <name>fs.azure.skip.metrics</name>
>>>>>>          <value>true</value>
>>>>>>        </property>
>>>>>>
>>>>>> </configuration>
>>>>>>
>>>>>> In Additional Resources property of an HDFS processor, point to a
>>>>>> directory with:
>>>>>>
>>>>>> azure-storage-2.0.0.jar
>>>>>> commons-codec-1.6.jar
>>>>>> commons-lang3-3.3.2.jar
>>>>>> commons-logging-1.1.1.jar
>>>>>> guava-11.0.2.jar
>>>>>> hadoop-azure-2.7.3.jar
>>>>>> httpclient-4.2.5.jar
>>>>>> httpcore-4.2.4.jar
>>>>>> jackson-core-2.2.3.jar
>>>>>> jsr305-1.3.9.jar
>>>>>> slf4j-api-1.7.5.jar
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Bryan
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> 
>>>>>> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Thanks for all the help you've given me so far. Today I'm trying to
>>>>>>> pull
>>>>>>> files from an Azure blob store. I've done some reading on this 
>>>>>>> and from
>>>>>>> previous tickets [1] and guides [2] it seems the recommended 
>>>>>>> approach
>>>>>>> is
>>>>>>> to
>>>>>>> place the required jars, to use the HDFS Azure protocol, in 
>>>>>>> 'Additional
>>>>>>> Classpath Resoures' and the hadoop core-site and hdfs-site 
>>>>>>> configs into
>>>>>>> the
>>>>>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>>>>>> configured
>>>>>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>>>>>> problem.
>>>>>>> Using the same HDFS config files and trying both all the jars in my
>>>>>>> hadoop-client/lib directory (hdp) and using the jars recommend 
>>>>>>> in [1]
>>>>>>> I'm
>>>>>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: 
>>>>>>> " error
>>>>>>> in
>>>>>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>>>>>
>>>>>>> Interestingly, it seems the processor is spinning up way to 
>>>>>>> fast, the
>>>>>>> errors
>>>>>>> appear in the log as soon as I start the processor. I'm not sure 
>>>>>>> how it
>>>>>>> could be loading all of those jars that quickly.
>>>>>>>
>>>>>>> Does anyone have any experience with this or recommendations to 
>>>>>>> try?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Austin
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>>>>>> [2]
>>>>>>>
>>>>>>>
>>>>>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>


Re: GetHDFS from Azure Blob

Posted by Austin Heyne <ah...@ccri.com>.
Thanks Bryan,

We're only working with one account here but with multiple root level 
containers. e.g.

wasb://csv@accountName.blob.core.windows.net/
wasb://xml@accountName.blob.core.windows.net/
wasb://json@accountName.blob.core.windows.net/

The thing that stands out to me the most is why would the defaultFS need 
to be set at all if we're always providing complete wasb://... paths? 
Almost seems like a bug or oversight.

If anyone has any input on how we could work around this please let me know.

Thanks for your help,
Austin

On 03/28/2017 04:39 PM, Bryan Bende wrote:
> Austin,
>
> I think you are correct that its <containername>@<accountname>, I
> hadn't looked at this config in a long time and was reading too
> quickly before :)
>
> That would line up with the other property
> fs.azure.account.key.<accountname>.blob.core.windows.net where you
> specify the key for that account.
>
> I have no idea if this will work, but lets say you had three different
> WASB file systems, presumably each with their own account name and
> key, you might be able to define these in core-site.xml:
>
>   <property>
>        <name>fs.azure.account.key.ACCOUNT1.blob.core.windows.net</name>
>        <value>KEY1</value>
>      </property>
>
>   <property>
>        <name>fs.azure.account.key.ACCOUNT2.blob.core.windows.net</name>
>        <value>KEY2</value>
>      </property>
>
>   <property>
>        <name>fs.azure.account.key.ACCOUNT3.blob.core.windows.net</name>
>        <value>KEY3</value>
>      </property>
>
> Then in your HDFS processor in NiFi you point at this core-site.xml
> and use a specific directory like
> wasb://container@ACCOUNT3.blob.core.windows.net/<path> and I'm hoping
> it would know how to use the key for ACCOUNT3.
>
> Not really sure if that helps your situation.
>
> -Bryan
>
>
> On Tue, Mar 28, 2017 at 4:14 PM, Austin Heyne <ah...@ccri.com> wrote:
>> Bryan,
>>
>> So I initially didn't think much of it (assumed it a typo, etc) but you've
>> said that the access url for wasb that you've been using is
>> wasb://YOUR_USER@YOUR_HOST/. However, this has never worked for us and I'm
>> wondering if we have a difference configuration somewhere. What we have to
>> use is wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
>> which seems to be in line with the Azure blob storage GUI and is what is
>> outlined here [1]. Is there some other way this connector is being setup? It
>> would make much more sense using your access pattern as then each container
>> wouldn't need to have it's own core-site.xml.
>>
>> Thanks,
>> Austin
>>
>> [1a]
>> https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Accessing_wasb_URLs
>> [1b]
>> https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage
>>
>>
>>
>>
>> On 03/28/2017 03:55 PM, Bryan Bende wrote:
>>> Austin,
>>>
>>> I believe the default FS is only used when you write to a path that
>>> doesn't specify the filesystem. Meaning, if you set the directory of
>>> PutHDFS to /data then it will use the default FS, but if you specify
>>> wasb://user@wasb2/data then it will go to /data in a different
>>> filesystem.
>>>
>>> The problem here is that I don't see a way to specify different keys
>>> for each WASB filesystem in the core-site.xml.
>>>
>>> Admittedly I have never tried to setup something like this with many
>>> different filesystems.
>>>
>>> -Bryan
>>>
>>>
>>> On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <ah...@ccri.com> wrote:
>>>> Hi Andre,
>>>>
>>>> Yes, I'm aware of that configuration property, it's what I have been
>>>> using
>>>> to set the core-site.xml and hdfs-site.xml. For testing this I didn't
>>>> modify
>>>> the core-site located in the HADOOP_CONF_DIR but rather copied and
>>>> modified
>>>> it and the pointed the processor to the copy. The problem with this is
>>>> that
>>>> we'll end up with a large number of core-site.xml copies that will all
>>>> have
>>>> to be maintained separately. Ideally we'd be able to specify the
>>>> defaultFS
>>>> in the processor config or have the processor behave like the hdfs
>>>> command
>>>> line tools. The command line tools don't require the defaultFS to be set
>>>> to
>>>> a wasb url in order to use wasb urls.
>>>>
>>>> The key idea here is long term maintainability and using Ambari to
>>>> maintain
>>>> the configuration. If we need to change any other setting in the
>>>> core-site.xml we'd have to change it in a bunch of different files
>>>> manually.
>>>>
>>>> Thanks,
>>>> Austin
>>>>
>>>>
>>>> On 03/28/2017 03:34 PM, Andre wrote:
>>>>
>>>> Austin,
>>>>
>>>> Perhaps that wasn't explicit but the settings don't need to be system
>>>> wide,
>>>> instead the defaultFS may be changed just for a particular processor,
>>>> while
>>>> the others may use configurations.
>>>>
>>>> The *HDFS processor documentation mentions it allows yout to set
>>>> particular
>>>> hadoop configurations:
>>>>
>>>> " A file or comma separated list of files which contains the Hadoop file
>>>> system configuration. Without this, Hadoop will search the classpath for
>>>> a
>>>> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
>>>> configuration"
>>>>
>>>> Have you tried using this field to point to a file as described by Bryan?
>>>>
>>>> Cheers
>>>>
>>>> On 29 Mar 2017 05:21, "Austin Heyne" <ah...@ccri.com> wrote:
>>>>
>>>> Thanks Bryan,
>>>>
>>>> Working with the configuration you sent what I needed to change was to
>>>> set
>>>> the fs.defaultFS to the wasb url that we're working from. Unfortunately
>>>> this
>>>> is a less than ideal solution since we'll be pulling files from multiple
>>>> wasb urls and ingesting them into an Accumulo datastore. Changing the
>>>> defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
>>>> install. In addition we're trying to maintain all of this configuration
>>>> with
>>>> Ambari, which from what I can tell only supports one core-site
>>>> configuration
>>>> file.
>>>>
>>>> Is the only solution here to maintain multiple core-site.xml files or is
>>>> there another way we configure this?
>>>>
>>>> Thanks,
>>>>
>>>> Austin
>>>>
>>>>
>>>>
>>>> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>>>> Austin,
>>>>>
>>>>> Can you provide the full error message and stacktrace for  the
>>>>> IllegalArgumentException from nifi-app.log?
>>>>>
>>>>> When you start the processor it creates a FileSystem instance based on
>>>>> the config files provided to the processor, which in turn causes all
>>>>> of the corresponding classes to load.
>>>>>
>>>>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>>>>> then I have successfully done the following...
>>>>>
>>>>> In core-site.xml:
>>>>>
>>>>> <configuration>
>>>>>
>>>>>        <property>
>>>>>          <name>fs.defaultFS</name>
>>>>>          <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>>>>        </property>
>>>>>
>>>>>        <property>
>>>>>          <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>>>>          <value>YOUR_KEY</value>
>>>>>        </property>
>>>>>
>>>>>        <property>
>>>>>          <name>fs.AbstractFileSystem.wasb.impl</name>
>>>>>          <value>org.apache.hadoop.fs.azure.Wasb</value>
>>>>>        </property>
>>>>>
>>>>>        <property>
>>>>>          <name>fs.wasb.impl</name>
>>>>>          <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>>>>        </property>
>>>>>
>>>>>        <property>
>>>>>          <name>fs.azure.skip.metrics</name>
>>>>>          <value>true</value>
>>>>>        </property>
>>>>>
>>>>> </configuration>
>>>>>
>>>>> In Additional Resources property of an HDFS processor, point to a
>>>>> directory with:
>>>>>
>>>>> azure-storage-2.0.0.jar
>>>>> commons-codec-1.6.jar
>>>>> commons-lang3-3.3.2.jar
>>>>> commons-logging-1.1.1.jar
>>>>> guava-11.0.2.jar
>>>>> hadoop-azure-2.7.3.jar
>>>>> httpclient-4.2.5.jar
>>>>> httpcore-4.2.4.jar
>>>>> jackson-core-2.2.3.jar
>>>>> jsr305-1.3.9.jar
>>>>> slf4j-api-1.7.5.jar
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Bryan
>>>>>
>>>>>
>>>>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Thanks for all the help you've given me so far. Today I'm trying to
>>>>>> pull
>>>>>> files from an Azure blob store. I've done some reading on this and from
>>>>>> previous tickets [1] and guides [2] it seems the recommended approach
>>>>>> is
>>>>>> to
>>>>>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>>>>>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>>>>>> the
>>>>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>>>>> configured
>>>>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>>>>> problem.
>>>>>> Using the same HDFS config files and trying both all the jars in my
>>>>>> hadoop-client/lib directory (hdp) and using the jars recommend in [1]
>>>>>> I'm
>>>>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>>>>>> in
>>>>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>>>>
>>>>>> Interestingly, it seems the processor is spinning up way to fast, the
>>>>>> errors
>>>>>> appear in the log as soon as I start the processor. I'm not sure how it
>>>>>> could be loading all of those jars that quickly.
>>>>>>
>>>>>> Does anyone have any experience with this or recommendations to try?
>>>>>>
>>>>>> Thanks,
>>>>>> Austin
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>>>>> [2]
>>>>>>
>>>>>>
>>>>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>>>>>
>>>>>>
>>>>


Re: GetHDFS from Azure Blob

Posted by Bryan Bende <bb...@gmail.com>.
Austin,

I think you are correct that its <containername>@<accountname>, I
hadn't looked at this config in a long time and was reading too
quickly before :)

That would line up with the other property
fs.azure.account.key.<accountname>.blob.core.windows.net where you
specify the key for that account.

I have no idea if this will work, but lets say you had three different
WASB file systems, presumably each with their own account name and
key, you might be able to define these in core-site.xml:

 <property>
      <name>fs.azure.account.key.ACCOUNT1.blob.core.windows.net</name>
      <value>KEY1</value>
    </property>

 <property>
      <name>fs.azure.account.key.ACCOUNT2.blob.core.windows.net</name>
      <value>KEY2</value>
    </property>

 <property>
      <name>fs.azure.account.key.ACCOUNT3.blob.core.windows.net</name>
      <value>KEY3</value>
    </property>

Then in your HDFS processor in NiFi you point at this core-site.xml
and use a specific directory like
wasb://container@ACCOUNT3.blob.core.windows.net/<path> and I'm hoping
it would know how to use the key for ACCOUNT3.

Not really sure if that helps your situation.

-Bryan


On Tue, Mar 28, 2017 at 4:14 PM, Austin Heyne <ah...@ccri.com> wrote:
> Bryan,
>
> So I initially didn't think much of it (assumed it a typo, etc) but you've
> said that the access url for wasb that you've been using is
> wasb://YOUR_USER@YOUR_HOST/. However, this has never worked for us and I'm
> wondering if we have a difference configuration somewhere. What we have to
> use is wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
> which seems to be in line with the Azure blob storage GUI and is what is
> outlined here [1]. Is there some other way this connector is being setup? It
> would make much more sense using your access pattern as then each container
> wouldn't need to have it's own core-site.xml.
>
> Thanks,
> Austin
>
> [1a]
> https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Accessing_wasb_URLs
> [1b]
> https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage
>
>
>
>
> On 03/28/2017 03:55 PM, Bryan Bende wrote:
>>
>> Austin,
>>
>> I believe the default FS is only used when you write to a path that
>> doesn't specify the filesystem. Meaning, if you set the directory of
>> PutHDFS to /data then it will use the default FS, but if you specify
>> wasb://user@wasb2/data then it will go to /data in a different
>> filesystem.
>>
>> The problem here is that I don't see a way to specify different keys
>> for each WASB filesystem in the core-site.xml.
>>
>> Admittedly I have never tried to setup something like this with many
>> different filesystems.
>>
>> -Bryan
>>
>>
>> On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <ah...@ccri.com> wrote:
>>>
>>> Hi Andre,
>>>
>>> Yes, I'm aware of that configuration property, it's what I have been
>>> using
>>> to set the core-site.xml and hdfs-site.xml. For testing this I didn't
>>> modify
>>> the core-site located in the HADOOP_CONF_DIR but rather copied and
>>> modified
>>> it and the pointed the processor to the copy. The problem with this is
>>> that
>>> we'll end up with a large number of core-site.xml copies that will all
>>> have
>>> to be maintained separately. Ideally we'd be able to specify the
>>> defaultFS
>>> in the processor config or have the processor behave like the hdfs
>>> command
>>> line tools. The command line tools don't require the defaultFS to be set
>>> to
>>> a wasb url in order to use wasb urls.
>>>
>>> The key idea here is long term maintainability and using Ambari to
>>> maintain
>>> the configuration. If we need to change any other setting in the
>>> core-site.xml we'd have to change it in a bunch of different files
>>> manually.
>>>
>>> Thanks,
>>> Austin
>>>
>>>
>>> On 03/28/2017 03:34 PM, Andre wrote:
>>>
>>> Austin,
>>>
>>> Perhaps that wasn't explicit but the settings don't need to be system
>>> wide,
>>> instead the defaultFS may be changed just for a particular processor,
>>> while
>>> the others may use configurations.
>>>
>>> The *HDFS processor documentation mentions it allows yout to set
>>> particular
>>> hadoop configurations:
>>>
>>> " A file or comma separated list of files which contains the Hadoop file
>>> system configuration. Without this, Hadoop will search the classpath for
>>> a
>>> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
>>> configuration"
>>>
>>> Have you tried using this field to point to a file as described by Bryan?
>>>
>>> Cheers
>>>
>>> On 29 Mar 2017 05:21, "Austin Heyne" <ah...@ccri.com> wrote:
>>>
>>> Thanks Bryan,
>>>
>>> Working with the configuration you sent what I needed to change was to
>>> set
>>> the fs.defaultFS to the wasb url that we're working from. Unfortunately
>>> this
>>> is a less than ideal solution since we'll be pulling files from multiple
>>> wasb urls and ingesting them into an Accumulo datastore. Changing the
>>> defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
>>> install. In addition we're trying to maintain all of this configuration
>>> with
>>> Ambari, which from what I can tell only supports one core-site
>>> configuration
>>> file.
>>>
>>> Is the only solution here to maintain multiple core-site.xml files or is
>>> there another way we configure this?
>>>
>>> Thanks,
>>>
>>> Austin
>>>
>>>
>>>
>>> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>>>
>>>> Austin,
>>>>
>>>> Can you provide the full error message and stacktrace for  the
>>>> IllegalArgumentException from nifi-app.log?
>>>>
>>>> When you start the processor it creates a FileSystem instance based on
>>>> the config files provided to the processor, which in turn causes all
>>>> of the corresponding classes to load.
>>>>
>>>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>>>> then I have successfully done the following...
>>>>
>>>> In core-site.xml:
>>>>
>>>> <configuration>
>>>>
>>>>       <property>
>>>>         <name>fs.defaultFS</name>
>>>>         <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>>>       </property>
>>>>
>>>>       <property>
>>>>         <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>>>         <value>YOUR_KEY</value>
>>>>       </property>
>>>>
>>>>       <property>
>>>>         <name>fs.AbstractFileSystem.wasb.impl</name>
>>>>         <value>org.apache.hadoop.fs.azure.Wasb</value>
>>>>       </property>
>>>>
>>>>       <property>
>>>>         <name>fs.wasb.impl</name>
>>>>         <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>>>       </property>
>>>>
>>>>       <property>
>>>>         <name>fs.azure.skip.metrics</name>
>>>>         <value>true</value>
>>>>       </property>
>>>>
>>>> </configuration>
>>>>
>>>> In Additional Resources property of an HDFS processor, point to a
>>>> directory with:
>>>>
>>>> azure-storage-2.0.0.jar
>>>> commons-codec-1.6.jar
>>>> commons-lang3-3.3.2.jar
>>>> commons-logging-1.1.1.jar
>>>> guava-11.0.2.jar
>>>> hadoop-azure-2.7.3.jar
>>>> httpclient-4.2.5.jar
>>>> httpcore-4.2.4.jar
>>>> jackson-core-2.2.3.jar
>>>> jsr305-1.3.9.jar
>>>> slf4j-api-1.7.5.jar
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bryan
>>>>
>>>>
>>>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Thanks for all the help you've given me so far. Today I'm trying to
>>>>> pull
>>>>> files from an Azure blob store. I've done some reading on this and from
>>>>> previous tickets [1] and guides [2] it seems the recommended approach
>>>>> is
>>>>> to
>>>>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>>>>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>>>>> the
>>>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>>>> configured
>>>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>>>> problem.
>>>>> Using the same HDFS config files and trying both all the jars in my
>>>>> hadoop-client/lib directory (hdp) and using the jars recommend in [1]
>>>>> I'm
>>>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>>>>> in
>>>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>>>
>>>>> Interestingly, it seems the processor is spinning up way to fast, the
>>>>> errors
>>>>> appear in the log as soon as I start the processor. I'm not sure how it
>>>>> could be loading all of those jars that quickly.
>>>>>
>>>>> Does anyone have any experience with this or recommendations to try?
>>>>>
>>>>> Thanks,
>>>>> Austin
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>>>> [2]
>>>>>
>>>>>
>>>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>>>>
>>>>>
>>>
>>>
>

Re: GetHDFS from Azure Blob

Posted by Austin Heyne <ah...@ccri.com>.
Bryan,

So I initially didn't think much of it (assumed it a typo, etc) but 
you've said that the access url for wasb that you've been using is 
wasb://YOUR_USER@YOUR_HOST/. However, this has never worked for us and 
I'm wondering if we have a difference configuration somewhere. What we 
have to use is 
wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> 
which seems to be in line with the Azure blob storage GUI and is what is 
outlined here [1]. Is there some other way this connector is being 
setup? It would make much more sense using your access pattern as then 
each container wouldn't need to have it's own core-site.xml.

Thanks,
Austin

[1a] 
https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Accessing_wasb_URLs
[1b] 
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage



On 03/28/2017 03:55 PM, Bryan Bende wrote:
> Austin,
>
> I believe the default FS is only used when you write to a path that
> doesn't specify the filesystem. Meaning, if you set the directory of
> PutHDFS to /data then it will use the default FS, but if you specify
> wasb://user@wasb2/data then it will go to /data in a different
> filesystem.
>
> The problem here is that I don't see a way to specify different keys
> for each WASB filesystem in the core-site.xml.
>
> Admittedly I have never tried to setup something like this with many
> different filesystems.
>
> -Bryan
>
>
> On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <ah...@ccri.com> wrote:
>> Hi Andre,
>>
>> Yes, I'm aware of that configuration property, it's what I have been using
>> to set the core-site.xml and hdfs-site.xml. For testing this I didn't modify
>> the core-site located in the HADOOP_CONF_DIR but rather copied and modified
>> it and the pointed the processor to the copy. The problem with this is that
>> we'll end up with a large number of core-site.xml copies that will all have
>> to be maintained separately. Ideally we'd be able to specify the defaultFS
>> in the processor config or have the processor behave like the hdfs command
>> line tools. The command line tools don't require the defaultFS to be set to
>> a wasb url in order to use wasb urls.
>>
>> The key idea here is long term maintainability and using Ambari to maintain
>> the configuration. If we need to change any other setting in the
>> core-site.xml we'd have to change it in a bunch of different files manually.
>>
>> Thanks,
>> Austin
>>
>>
>> On 03/28/2017 03:34 PM, Andre wrote:
>>
>> Austin,
>>
>> Perhaps that wasn't explicit but the settings don't need to be system wide,
>> instead the defaultFS may be changed just for a particular processor, while
>> the others may use configurations.
>>
>> The *HDFS processor documentation mentions it allows yout to set particular
>> hadoop configurations:
>>
>> " A file or comma separated list of files which contains the Hadoop file
>> system configuration. Without this, Hadoop will search the classpath for a
>> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
>> configuration"
>>
>> Have you tried using this field to point to a file as described by Bryan?
>>
>> Cheers
>>
>> On 29 Mar 2017 05:21, "Austin Heyne" <ah...@ccri.com> wrote:
>>
>> Thanks Bryan,
>>
>> Working with the configuration you sent what I needed to change was to set
>> the fs.defaultFS to the wasb url that we're working from. Unfortunately this
>> is a less than ideal solution since we'll be pulling files from multiple
>> wasb urls and ingesting them into an Accumulo datastore. Changing the
>> defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
>> install. In addition we're trying to maintain all of this configuration with
>> Ambari, which from what I can tell only supports one core-site configuration
>> file.
>>
>> Is the only solution here to maintain multiple core-site.xml files or is
>> there another way we configure this?
>>
>> Thanks,
>>
>> Austin
>>
>>
>>
>> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>> Austin,
>>>
>>> Can you provide the full error message and stacktrace for  the
>>> IllegalArgumentException from nifi-app.log?
>>>
>>> When you start the processor it creates a FileSystem instance based on
>>> the config files provided to the processor, which in turn causes all
>>> of the corresponding classes to load.
>>>
>>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>>> then I have successfully done the following...
>>>
>>> In core-site.xml:
>>>
>>> <configuration>
>>>
>>>       <property>
>>>         <name>fs.defaultFS</name>
>>>         <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>>         <value>YOUR_KEY</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.AbstractFileSystem.wasb.impl</name>
>>>         <value>org.apache.hadoop.fs.azure.Wasb</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.wasb.impl</name>
>>>         <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.azure.skip.metrics</name>
>>>         <value>true</value>
>>>       </property>
>>>
>>> </configuration>
>>>
>>> In Additional Resources property of an HDFS processor, point to a
>>> directory with:
>>>
>>> azure-storage-2.0.0.jar
>>> commons-codec-1.6.jar
>>> commons-lang3-3.3.2.jar
>>> commons-logging-1.1.1.jar
>>> guava-11.0.2.jar
>>> hadoop-azure-2.7.3.jar
>>> httpclient-4.2.5.jar
>>> httpcore-4.2.4.jar
>>> jackson-core-2.2.3.jar
>>> jsr305-1.3.9.jar
>>> slf4j-api-1.7.5.jar
>>>
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>>
>>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>>>> Hi all,
>>>>
>>>> Thanks for all the help you've given me so far. Today I'm trying to pull
>>>> files from an Azure blob store. I've done some reading on this and from
>>>> previous tickets [1] and guides [2] it seems the recommended approach is
>>>> to
>>>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>>>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>>>> the
>>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>>> configured
>>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>>> problem.
>>>> Using the same HDFS config files and trying both all the jars in my
>>>> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm
>>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>>>> in
>>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>>
>>>> Interestingly, it seems the processor is spinning up way to fast, the
>>>> errors
>>>> appear in the log as soon as I start the processor. I'm not sure how it
>>>> could be loading all of those jars that quickly.
>>>>
>>>> Does anyone have any experience with this or recommendations to try?
>>>>
>>>> Thanks,
>>>> Austin
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>>> [2]
>>>>
>>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>>>
>>>>
>>
>>


Re: GetHDFS from Azure Blob

Posted by Bryan Bende <bb...@gmail.com>.
Austin,

I believe the default FS is only used when you write to a path that
doesn't specify the filesystem. Meaning, if you set the directory of
PutHDFS to /data then it will use the default FS, but if you specify
wasb://user@wasb2/data then it will go to /data in a different
filesystem.

The problem here is that I don't see a way to specify different keys
for each WASB filesystem in the core-site.xml.

Admittedly I have never tried to setup something like this with many
different filesystems.

-Bryan


On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <ah...@ccri.com> wrote:
> Hi Andre,
>
> Yes, I'm aware of that configuration property, it's what I have been using
> to set the core-site.xml and hdfs-site.xml. For testing this I didn't modify
> the core-site located in the HADOOP_CONF_DIR but rather copied and modified
> it and the pointed the processor to the copy. The problem with this is that
> we'll end up with a large number of core-site.xml copies that will all have
> to be maintained separately. Ideally we'd be able to specify the defaultFS
> in the processor config or have the processor behave like the hdfs command
> line tools. The command line tools don't require the defaultFS to be set to
> a wasb url in order to use wasb urls.
>
> The key idea here is long term maintainability and using Ambari to maintain
> the configuration. If we need to change any other setting in the
> core-site.xml we'd have to change it in a bunch of different files manually.
>
> Thanks,
> Austin
>
>
> On 03/28/2017 03:34 PM, Andre wrote:
>
> Austin,
>
> Perhaps that wasn't explicit but the settings don't need to be system wide,
> instead the defaultFS may be changed just for a particular processor, while
> the others may use configurations.
>
> The *HDFS processor documentation mentions it allows yout to set particular
> hadoop configurations:
>
> " A file or comma separated list of files which contains the Hadoop file
> system configuration. Without this, Hadoop will search the classpath for a
> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
> configuration"
>
> Have you tried using this field to point to a file as described by Bryan?
>
> Cheers
>
> On 29 Mar 2017 05:21, "Austin Heyne" <ah...@ccri.com> wrote:
>
> Thanks Bryan,
>
> Working with the configuration you sent what I needed to change was to set
> the fs.defaultFS to the wasb url that we're working from. Unfortunately this
> is a less than ideal solution since we'll be pulling files from multiple
> wasb urls and ingesting them into an Accumulo datastore. Changing the
> defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
> install. In addition we're trying to maintain all of this configuration with
> Ambari, which from what I can tell only supports one core-site configuration
> file.
>
> Is the only solution here to maintain multiple core-site.xml files or is
> there another way we configure this?
>
> Thanks,
>
> Austin
>
>
>
> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>
>> Austin,
>>
>> Can you provide the full error message and stacktrace for  the
>> IllegalArgumentException from nifi-app.log?
>>
>> When you start the processor it creates a FileSystem instance based on
>> the config files provided to the processor, which in turn causes all
>> of the corresponding classes to load.
>>
>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>> then I have successfully done the following...
>>
>> In core-site.xml:
>>
>> <configuration>
>>
>>      <property>
>>        <name>fs.defaultFS</name>
>>        <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>        <value>YOUR_KEY</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.AbstractFileSystem.wasb.impl</name>
>>        <value>org.apache.hadoop.fs.azure.Wasb</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.wasb.impl</name>
>>        <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.azure.skip.metrics</name>
>>        <value>true</value>
>>      </property>
>>
>> </configuration>
>>
>> In Additional Resources property of an HDFS processor, point to a
>> directory with:
>>
>> azure-storage-2.0.0.jar
>> commons-codec-1.6.jar
>> commons-lang3-3.3.2.jar
>> commons-logging-1.1.1.jar
>> guava-11.0.2.jar
>> hadoop-azure-2.7.3.jar
>> httpclient-4.2.5.jar
>> httpcore-4.2.4.jar
>> jackson-core-2.2.3.jar
>> jsr305-1.3.9.jar
>> slf4j-api-1.7.5.jar
>>
>>
>> Thanks,
>>
>> Bryan
>>
>>
>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>>>
>>> Hi all,
>>>
>>> Thanks for all the help you've given me so far. Today I'm trying to pull
>>> files from an Azure blob store. I've done some reading on this and from
>>> previous tickets [1] and guides [2] it seems the recommended approach is
>>> to
>>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>>> the
>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>> configured
>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>> problem.
>>> Using the same HDFS config files and trying both all the jars in my
>>> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm
>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>>> in
>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>
>>> Interestingly, it seems the processor is spinning up way to fast, the
>>> errors
>>> appear in the log as soon as I start the processor. I'm not sure how it
>>> could be loading all of those jars that quickly.
>>>
>>> Does anyone have any experience with this or recommendations to try?
>>>
>>> Thanks,
>>> Austin
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>> [2]
>>>
>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>>
>>>
>
>
>

Re: GetHDFS from Azure Blob

Posted by Austin Heyne <ah...@ccri.com>.
Hi Andre,

Yes, I'm aware of that configuration property, it's what I have been 
using to set the core-site.xml and hdfs-site.xml. For testing this I 
didn't modify the core-site located in the HADOOP_CONF_DIR but rather 
copied and modified it and the pointed the processor to the copy. The 
problem with this is that we'll end up with a large number of 
core-site.xml copies that will all have to be maintained separately. 
Ideally we'd be able to specify the defaultFS in the processor config or 
have the processor behave like the hdfs command line tools. The command 
line tools don't require the defaultFS to be set to a wasb url in order 
to use wasb urls.

The key idea here is long term maintainability and using Ambari to 
maintain the configuration. If we need to change any other setting in 
the core-site.xml we'd have to change it in a bunch of different files 
manually.

Thanks,
Austin


On 03/28/2017 03:34 PM, Andre wrote:
> Austin,
>
> Perhaps that wasn't explicit but the settings don't need to be system 
> wide, instead the defaultFS may be changed just for a particular 
> processor, while the others may use configurations.
>
> The *HDFS processor documentation mentions it allows yout to set 
> particular  hadoop configurations:
>
> " A file or comma separated list of files which contains the Hadoop 
> file system configuration. Without this, Hadoop will search the 
> classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will 
> revert to a default configuration"
>
> Have you tried using this field to point to a file as described by Bryan?
>
> Cheers
>
> On 29 Mar 2017 05:21, "Austin Heyne" <aheyne@ccri.com 
> <ma...@ccri.com>> wrote:
>
>     Thanks Bryan,
>
>     Working with the configuration you sent what I needed to change
>     was to set the fs.defaultFS to the wasb url that we're working
>     from. Unfortunately this is a less than ideal solution since we'll
>     be pulling files from multiple wasb urls and ingesting them into
>     an Accumulo datastore. Changing the defaultFS I'm pretty certainly
>     would mess with our local HDFS/Accumulo install. In addition we're
>     trying to maintain all of this configuration with Ambari, which
>     from what I can tell only supports one core-site configuration file.
>
>     Is the only solution here to maintain multiple core-site.xml files
>     or is there another way we configure this?
>
>     Thanks,
>
>     Austin
>
>
>
>     On 03/28/2017 01:41 PM, Bryan Bende wrote:
>
>         Austin,
>
>         Can you provide the full error message and stacktrace for  the
>         IllegalArgumentException from nifi-app.log?
>
>         When you start the processor it creates a FileSystem instance
>         based on
>         the config files provided to the processor, which in turn
>         causes all
>         of the corresponding classes to load.
>
>         I'm not that familiar with Azure, but if "Azure blob store" is
>         WASB,
>         then I have successfully done the following...
>
>         In core-site.xml:
>
>         <configuration>
>
>              <property>
>                <name>fs.defaultFS</name>
>                <value>wasb://YOUR_USER@YOUR_HOST/</value>
>              </property>
>
>              <property>
>                <name>fs.azure.account.key.nifi.blob.core.windows.net
>         <http://fs.azure.account.key.nifi.blob.core.windows.net></name>
>                <value>YOUR_KEY</value>
>              </property>
>
>              <property>
>                <name>fs.AbstractFileSystem.wasb.impl</name>
>                <value>org.apache.hadoop.fs.azure.Wasb</value>
>              </property>
>
>              <property>
>                <name>fs.wasb.impl</name>
>              
>          <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>              </property>
>
>              <property>
>                <name>fs.azure.skip.metrics</name>
>                <value>true</value>
>              </property>
>
>         </configuration>
>
>         In Additional Resources property of an HDFS processor, point to a
>         directory with:
>
>         azure-storage-2.0.0.jar
>         commons-codec-1.6.jar
>         commons-lang3-3.3.2.jar
>         commons-logging-1.1.1.jar
>         guava-11.0.2.jar
>         hadoop-azure-2.7.3.jar
>         httpclient-4.2.5.jar
>         httpcore-4.2.4.jar
>         jackson-core-2.2.3.jar
>         jsr305-1.3.9.jar
>         slf4j-api-1.7.5.jar
>
>
>         Thanks,
>
>         Bryan
>
>
>         On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <aheyne@ccri.com
>         <ma...@ccri.com>> wrote:
>
>             Hi all,
>
>             Thanks for all the help you've given me so far. Today I'm
>             trying to pull
>             files from an Azure blob store. I've done some reading on
>             this and from
>             previous tickets [1] and guides [2] it seems the
>             recommended approach is to
>             place the required jars, to use the HDFS Azure protocol,
>             in 'Additional
>             Classpath Resoures' and the hadoop core-site and hdfs-site
>             configs into the
>             'Hadoop Configuration Resources'. I have my local HDFS
>             properly configured
>             to access wasb urls. I'm able to ls, copy to and from, etc
>             with out problem.
>             Using the same HDFS config files and trying both all the
>             jars in my
>             hadoop-client/lib directory (hdp) and using the jars
>             recommend in [1] I'm
>             still seeing the "java.lang.IllegalArgumentException:
>             Wrong FS: " error in
>             my NiFi logs and am unable to pull files from Azure blob
>             storage.
>
>             Interestingly, it seems the processor is spinning up way
>             to fast, the errors
>             appear in the log as soon as I start the processor. I'm
>             not sure how it
>             could be loading all of those jars that quickly.
>
>             Does anyone have any experience with this or
>             recommendations to try?
>
>             Thanks,
>             Austin
>
>             [1] https://issues.apache.org/jira/browse/NIFI-1922
>             <https://issues.apache.org/jira/browse/NIFI-1922>
>             [2]
>             https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>             <https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html>
>
>
>
>


Re: GetHDFS from Azure Blob

Posted by Andre <an...@fucs.org>.
Austin,

Perhaps that wasn't explicit but the settings don't need to be system wide,
instead the defaultFS may be changed just for a particular processor, while
the others may use configurations.

The *HDFS processor documentation mentions it allows yout to set particular
 hadoop configurations:

" A file or comma separated list of files which contains the Hadoop file
system configuration. Without this, Hadoop will search the classpath for a
'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
configuration"

Have you tried using this field to point to a file as described by Bryan?

Cheers

On 29 Mar 2017 05:21, "Austin Heyne" <ah...@ccri.com> wrote:

Thanks Bryan,

Working with the configuration you sent what I needed to change was to set
the fs.defaultFS to the wasb url that we're working from. Unfortunately
this is a less than ideal solution since we'll be pulling files from
multiple wasb urls and ingesting them into an Accumulo datastore. Changing
the defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
install. In addition we're trying to maintain all of this configuration
with Ambari, which from what I can tell only supports one core-site
configuration file.

Is the only solution here to maintain multiple core-site.xml files or is
there another way we configure this?

Thanks,

Austin



On 03/28/2017 01:41 PM, Bryan Bende wrote:

> Austin,
>
> Can you provide the full error message and stacktrace for  the
> IllegalArgumentException from nifi-app.log?
>
> When you start the processor it creates a FileSystem instance based on
> the config files provided to the processor, which in turn causes all
> of the corresponding classes to load.
>
> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
> then I have successfully done the following...
>
> In core-site.xml:
>
> <configuration>
>
>      <property>
>        <name>fs.defaultFS</name>
>        <value>wasb://YOUR_USER@YOUR_HOST/</value>
>      </property>
>
>      <property>
>        <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>        <value>YOUR_KEY</value>
>      </property>
>
>      <property>
>        <name>fs.AbstractFileSystem.wasb.impl</name>
>        <value>org.apache.hadoop.fs.azure.Wasb</value>
>      </property>
>
>      <property>
>        <name>fs.wasb.impl</name>
>        <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>      </property>
>
>      <property>
>        <name>fs.azure.skip.metrics</name>
>        <value>true</value>
>      </property>
>
> </configuration>
>
> In Additional Resources property of an HDFS processor, point to a
> directory with:
>
> azure-storage-2.0.0.jar
> commons-codec-1.6.jar
> commons-lang3-3.3.2.jar
> commons-logging-1.1.1.jar
> guava-11.0.2.jar
> hadoop-azure-2.7.3.jar
> httpclient-4.2.5.jar
> httpcore-4.2.4.jar
> jackson-core-2.2.3.jar
> jsr305-1.3.9.jar
> slf4j-api-1.7.5.jar
>
>
> Thanks,
>
> Bryan
>
>
> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>
>> Hi all,
>>
>> Thanks for all the help you've given me so far. Today I'm trying to pull
>> files from an Azure blob store. I've done some reading on this and from
>> previous tickets [1] and guides [2] it seems the recommended approach is
>> to
>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>> the
>> 'Hadoop Configuration Resources'. I have my local HDFS properly configured
>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>> problem.
>> Using the same HDFS config files and trying both all the jars in my
>> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm
>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>> in
>> my NiFi logs and am unable to pull files from Azure blob storage.
>>
>> Interestingly, it seems the processor is spinning up way to fast, the
>> errors
>> appear in the log as soon as I start the processor. I'm not sure how it
>> could be loading all of those jars that quickly.
>>
>> Does anyone have any experience with this or recommendations to try?
>>
>> Thanks,
>> Austin
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>> [2]
>> https://community.hortonworks.com/articles/71916/connecting-
>> to-azure-data-lake-from-a-nifi-dataflow.html
>>
>>
>>

Re: GetHDFS from Azure Blob

Posted by Austin Heyne <ah...@ccri.com>.
Thanks Bryan,

Working with the configuration you sent what I needed to change was to 
set the fs.defaultFS to the wasb url that we're working from. 
Unfortunately this is a less than ideal solution since we'll be pulling 
files from multiple wasb urls and ingesting them into an Accumulo 
datastore. Changing the defaultFS I'm pretty certainly would mess with 
our local HDFS/Accumulo install. In addition we're trying to maintain 
all of this configuration with Ambari, which from what I can tell only 
supports one core-site configuration file.

Is the only solution here to maintain multiple core-site.xml files or is 
there another way we configure this?

Thanks,

Austin


On 03/28/2017 01:41 PM, Bryan Bende wrote:
> Austin,
>
> Can you provide the full error message and stacktrace for  the
> IllegalArgumentException from nifi-app.log?
>
> When you start the processor it creates a FileSystem instance based on
> the config files provided to the processor, which in turn causes all
> of the corresponding classes to load.
>
> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
> then I have successfully done the following...
>
> In core-site.xml:
>
> <configuration>
>
>      <property>
>        <name>fs.defaultFS</name>
>        <value>wasb://YOUR_USER@YOUR_HOST/</value>
>      </property>
>
>      <property>
>        <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>        <value>YOUR_KEY</value>
>      </property>
>
>      <property>
>        <name>fs.AbstractFileSystem.wasb.impl</name>
>        <value>org.apache.hadoop.fs.azure.Wasb</value>
>      </property>
>
>      <property>
>        <name>fs.wasb.impl</name>
>        <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>      </property>
>
>      <property>
>        <name>fs.azure.skip.metrics</name>
>        <value>true</value>
>      </property>
>
> </configuration>
>
> In Additional Resources property of an HDFS processor, point to a
> directory with:
>
> azure-storage-2.0.0.jar
> commons-codec-1.6.jar
> commons-lang3-3.3.2.jar
> commons-logging-1.1.1.jar
> guava-11.0.2.jar
> hadoop-azure-2.7.3.jar
> httpclient-4.2.5.jar
> httpcore-4.2.4.jar
> jackson-core-2.2.3.jar
> jsr305-1.3.9.jar
> slf4j-api-1.7.5.jar
>
>
> Thanks,
>
> Bryan
>
>
> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>> Hi all,
>>
>> Thanks for all the help you've given me so far. Today I'm trying to pull
>> files from an Azure blob store. I've done some reading on this and from
>> previous tickets [1] and guides [2] it seems the recommended approach is to
>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into the
>> 'Hadoop Configuration Resources'. I have my local HDFS properly configured
>> to access wasb urls. I'm able to ls, copy to and from, etc with out problem.
>> Using the same HDFS config files and trying both all the jars in my
>> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm
>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error in
>> my NiFi logs and am unable to pull files from Azure blob storage.
>>
>> Interestingly, it seems the processor is spinning up way to fast, the errors
>> appear in the log as soon as I start the processor. I'm not sure how it
>> could be loading all of those jars that quickly.
>>
>> Does anyone have any experience with this or recommendations to try?
>>
>> Thanks,
>> Austin
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>> [2]
>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>
>>


Re: GetHDFS from Azure Blob

Posted by Andrew Grande <ap...@gmail.com>.
Just wanted to mention there is an in-progress PR for native support for
Azure blobs, using the Azure storage Java SDK:
https://github.com/apache/nifi/pull/1636

It might be more straightforward in your case, no additional libraries
configuration required.

Andrew

On Tue, Apr 4, 2017 at 11:40 AM Bryan Bende <bb...@gmail.com> wrote:

> Giovanni,
>
> In the pom.xml at the root of the NiFi source tree:
>
> <hadoop.version>2.7.3</hadoop.version>
>
> You can change that to 2.8.0 (if 2.8.0 is released) and then run a
> full build, assuming 2.8.0 doesn't break any code that NiFi is using.
>
> I don't really view this as the same issue as NIFI-1922... even if the
> solution to NIFI-1922 was to directly bundle the Azure/Wasb JARs in
> NiFi, we would still have to bundle the JARs that are compatible with
> the Hadoop client we are using, which is currently 2.7.3.
>
> In the future when we have an extension registry, we could presumably
> publish variations of the nifi-hadoop-nar + nifi-hadoop-libraries-nar
> built against different versions of the Hadoop client (2.7.x, 2.8.x,
> HDP, CDH, MapR, etc), and with the component versioning work currently
> going on in master, it would be easy for people to run as many of
> these in parallel as they want.
>
> For now I think the easiest thing to do is maintain your own build of
> nifi-hadoop-libraries-nar by changing the the version mentioned above.
>
> At some point the NiFi community will likely move to a newer Hadoop
> client as they come out (we fairly recently moved from 2.6.x to
> 2.7.x), but this a bigger decision that depends on how stable the
> client is and what (if any) ramifications it has for compatibility.
>
> Thanks,
>
> Bryan
>
>
>
> On Tue, Apr 4, 2017 at 11:20 AM, Giovanni Lanzani
> <gi...@godatadriven.com> wrote:
> > Hi Brian,
> >
> > Thanks for the reply.
> >
> > Is there a way to compile NiFi using the Hadoop 2.8.0 libraries?
> >
> > It's of course unfortunate, but the libraries you mentioned before works
> in their very specific version. Once you use a newer version (like
> azure-storage-2.2.0) then things seem to break.
> >
> > Maybe this jira [^1] could be reopened then? 😊
> >
> > Cheers,
> >
> > Giovanni
> >
> > [^1]: https://issues.apache.org/jira/browse/NIFI-1922
> >
> >> -----Original Message-----
> >> From: Bryan Bende [mailto:bbende@gmail.com]
> >> Sent: Tuesday, April 4, 2017 3:59 PM
> >> To: users@nifi.apache.org
> >> Subject: Re: GetHDFS from Azure Blob
> >>
> >> Giovanni,
> >>
> >> I'm not that familiar with using a key provider, but NiFi currently
> bundles the
> >> Hadoop 2.7.3 client, and looking at ProviderUtils from 2.7.3, there
> doesn't
> >> appear to be a method
> >> "excludeIncompatibleCredentialProviders":
> >>
> >> https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-
> >> project/hadoop-
> >> common/src/main/java/org/apache/hadoop/security/ProviderUtils.java
> >>
> >> It looks like it is introduced in 2.8.0:
> >>
> >> https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-
> >> project/hadoop-
> >> common/src/main/java/org/apache/hadoop/security/ProviderUtils.java
> >>
> >> Most likely some code that is present in one of the JARs specified
> through
> >> Additional Resources is dependent on Hadoop 2.8.0, and since NiFi is
> bundling
> >> 2.7.3, there are some things not lining up.
> >>
> >> -Bryan
> >>
> >>
> >> On Tue, Apr 4, 2017 at 9:50 AM, Giovanni Lanzani
> >> <gi...@godatadriven.com> wrote:
> >> > Bryan,
> >> >
> >> > Allow me to chime in (to ask for help).
> >> >
> >> > What about when I'm using an encrypted key?
> >> >
> >> > In my case I have (in core-site.xml)
> >> >
> >> >    <property>
> >> >
> >> <name>
> fs.azure.account.keyprovider.nsanalyticsstorage.blob.core.windows.ne
> >> t</name>
> >> >
>  <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
> >> >     </property>
> >> >
> >> > Everything works from the command line (hdfs dfs).
> >> >
> >> > But NiFi complains with:
> >> >
> >> > java.lang.NoSuchMethodError:
> >> > org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredential
> >> > Providers
> >> >
> >> > Any ideas? I've already linked hadoop-commons.jar as well (besides
> what you
> >> suggested below).
> >> >
> >> > Cheers,
> >> >
> >> > Giovanni
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Bryan Bende [mailto:bbende@gmail.com]
> >> >> Sent: Tuesday, March 28, 2017 7:41 PM
> >> >> To: users@nifi.apache.org
> >> >> Subject: Re: GetHDFS from Azure Blob
> >> >>
> >> >> Austin,
> >> >>
> >> >> Can you provide the full error message and stacktrace for  the
> >> >> IllegalArgumentException from nifi-app.log?
> >> >>
> >> >> When you start the processor it creates a FileSystem instance based
> >> >> on the config files provided to the processor, which in turn causes
> >> >> all of the corresponding classes to load.
> >> >>
> >> >> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
> >> >> then I have successfully done the following...
> >> >>
> >> >> In core-site.xml:
> >> >>
> >> >> <configuration>
> >> >>
> >> >>     <property>
> >> >>       <name>fs.defaultFS</name>
> >> >>       <value>wasb://YOUR_USER@YOUR_HOST/</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
> >> >>       <value>YOUR_KEY</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>fs.AbstractFileSystem.wasb.impl</name>
> >> >>       <value>org.apache.hadoop.fs.azure.Wasb</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>fs.wasb.impl</name>
> >> >>       <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>fs.azure.skip.metrics</name>
> >> >>       <value>true</value>
> >> >>     </property>
> >> >>
> >> >> </configuration>
> >> >>
> >> >> In Additional Resources property of an HDFS processor, point to a
> >> >> directory
> >> >> with:
> >> >>
> >> >> azure-storage-2.0.0.jar
> >> >> commons-codec-1.6.jar
> >> >> commons-lang3-3.3.2.jar
> >> >> commons-logging-1.1.1.jar
> >> >> guava-11.0.2.jar
> >> >> hadoop-azure-2.7.3.jar
> >> >> httpclient-4.2.5.jar
> >> >> httpcore-4.2.4.jar
> >> >> jackson-core-2.2.3.jar
> >> >> jsr305-1.3.9.jar
> >> >> slf4j-api-1.7.5.jar
> >> >>
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Bryan
> >> >>
> >> >>
> >> >> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com>
> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > Thanks for all the help you've given me so far. Today I'm trying to
> >> >> > pull files from an Azure blob store. I've done some reading on this
> >> >> > and from previous tickets [1] and guides [2] it seems the
> >> >> > recommended approach is to place the required jars, to use the HDFS
> >> >> > Azure protocol, in 'Additional Classpath Resoures' and the hadoop
> >> >> > core-site and hdfs-site configs into the 'Hadoop Configuration
> >> >> > Resources'. I have my local HDFS properly configured to access wasb
> >> >> > urls. I'm able to ls,
> >> >> copy to and from, etc with out problem.
> >> >> > Using the same HDFS config files and trying both all the jars in my
> >> >> > hadoop-client/lib directory (hdp) and using the jars recommend in
> >> >> > [1] I'm still seeing the "java.lang.IllegalArgumentException:
> Wrong FS: "
> >> >> > error in my NiFi logs and am unable to pull files from Azure blob
> storage.
> >> >> >
> >> >> > Interestingly, it seems the processor is spinning up way to fast,
> >> >> > the errors appear in the log as soon as I start the processor. I'm
> >> >> > not sure how it could be loading all of those jars that quickly.
> >> >> >
> >> >> > Does anyone have any experience with this or recommendations to
> try?
> >> >> >
> >> >> > Thanks,
> >> >> > Austin
> >> >> >
> >> >> > [1] https://issues.apache.org/jira/browse/NIFI-1922
> >> >> > [2]
> >> >> >
> https://community.hortonworks.com/articles/71916/connecting-to-azur
> >> >> > e-d
> >> >> > ata-lake-from-a-nifi-dataflow.html
> >> >> >
> >> >> >
>

Re: GetHDFS from Azure Blob

Posted by Bryan Bende <bb...@gmail.com>.
Giovanni,

In the pom.xml at the root of the NiFi source tree:

<hadoop.version>2.7.3</hadoop.version>

You can change that to 2.8.0 (if 2.8.0 is released) and then run a
full build, assuming 2.8.0 doesn't break any code that NiFi is using.

I don't really view this as the same issue as NIFI-1922... even if the
solution to NIFI-1922 was to directly bundle the Azure/Wasb JARs in
NiFi, we would still have to bundle the JARs that are compatible with
the Hadoop client we are using, which is currently 2.7.3.

In the future when we have an extension registry, we could presumably
publish variations of the nifi-hadoop-nar + nifi-hadoop-libraries-nar
built against different versions of the Hadoop client (2.7.x, 2.8.x,
HDP, CDH, MapR, etc), and with the component versioning work currently
going on in master, it would be easy for people to run as many of
these in parallel as they want.

For now I think the easiest thing to do is maintain your own build of
nifi-hadoop-libraries-nar by changing the the version mentioned above.

At some point the NiFi community will likely move to a newer Hadoop
client as they come out (we fairly recently moved from 2.6.x to
2.7.x), but this a bigger decision that depends on how stable the
client is and what (if any) ramifications it has for compatibility.

Thanks,

Bryan



On Tue, Apr 4, 2017 at 11:20 AM, Giovanni Lanzani
<gi...@godatadriven.com> wrote:
> Hi Brian,
>
> Thanks for the reply.
>
> Is there a way to compile NiFi using the Hadoop 2.8.0 libraries?
>
> It's of course unfortunate, but the libraries you mentioned before works in their very specific version. Once you use a newer version (like azure-storage-2.2.0) then things seem to break.
>
> Maybe this jira [^1] could be reopened then? 😊
>
> Cheers,
>
> Giovanni
>
> [^1]: https://issues.apache.org/jira/browse/NIFI-1922
>
>> -----Original Message-----
>> From: Bryan Bende [mailto:bbende@gmail.com]
>> Sent: Tuesday, April 4, 2017 3:59 PM
>> To: users@nifi.apache.org
>> Subject: Re: GetHDFS from Azure Blob
>>
>> Giovanni,
>>
>> I'm not that familiar with using a key provider, but NiFi currently bundles the
>> Hadoop 2.7.3 client, and looking at ProviderUtils from 2.7.3, there doesn't
>> appear to be a method
>> "excludeIncompatibleCredentialProviders":
>>
>> https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-
>> project/hadoop-
>> common/src/main/java/org/apache/hadoop/security/ProviderUtils.java
>>
>> It looks like it is introduced in 2.8.0:
>>
>> https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-
>> project/hadoop-
>> common/src/main/java/org/apache/hadoop/security/ProviderUtils.java
>>
>> Most likely some code that is present in one of the JARs specified through
>> Additional Resources is dependent on Hadoop 2.8.0, and since NiFi is bundling
>> 2.7.3, there are some things not lining up.
>>
>> -Bryan
>>
>>
>> On Tue, Apr 4, 2017 at 9:50 AM, Giovanni Lanzani
>> <gi...@godatadriven.com> wrote:
>> > Bryan,
>> >
>> > Allow me to chime in (to ask for help).
>> >
>> > What about when I'm using an encrypted key?
>> >
>> > In my case I have (in core-site.xml)
>> >
>> >    <property>
>> >
>> <name>fs.azure.account.keyprovider.nsanalyticsstorage.blob.core.windows.ne
>> t</name>
>> >       <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
>> >     </property>
>> >
>> > Everything works from the command line (hdfs dfs).
>> >
>> > But NiFi complains with:
>> >
>> > java.lang.NoSuchMethodError:
>> > org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredential
>> > Providers
>> >
>> > Any ideas? I've already linked hadoop-commons.jar as well (besides what you
>> suggested below).
>> >
>> > Cheers,
>> >
>> > Giovanni
>> >
>> >
>> >> -----Original Message-----
>> >> From: Bryan Bende [mailto:bbende@gmail.com]
>> >> Sent: Tuesday, March 28, 2017 7:41 PM
>> >> To: users@nifi.apache.org
>> >> Subject: Re: GetHDFS from Azure Blob
>> >>
>> >> Austin,
>> >>
>> >> Can you provide the full error message and stacktrace for  the
>> >> IllegalArgumentException from nifi-app.log?
>> >>
>> >> When you start the processor it creates a FileSystem instance based
>> >> on the config files provided to the processor, which in turn causes
>> >> all of the corresponding classes to load.
>> >>
>> >> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>> >> then I have successfully done the following...
>> >>
>> >> In core-site.xml:
>> >>
>> >> <configuration>
>> >>
>> >>     <property>
>> >>       <name>fs.defaultFS</name>
>> >>       <value>wasb://YOUR_USER@YOUR_HOST/</value>
>> >>     </property>
>> >>
>> >>     <property>
>> >>       <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>> >>       <value>YOUR_KEY</value>
>> >>     </property>
>> >>
>> >>     <property>
>> >>       <name>fs.AbstractFileSystem.wasb.impl</name>
>> >>       <value>org.apache.hadoop.fs.azure.Wasb</value>
>> >>     </property>
>> >>
>> >>     <property>
>> >>       <name>fs.wasb.impl</name>
>> >>       <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>> >>     </property>
>> >>
>> >>     <property>
>> >>       <name>fs.azure.skip.metrics</name>
>> >>       <value>true</value>
>> >>     </property>
>> >>
>> >> </configuration>
>> >>
>> >> In Additional Resources property of an HDFS processor, point to a
>> >> directory
>> >> with:
>> >>
>> >> azure-storage-2.0.0.jar
>> >> commons-codec-1.6.jar
>> >> commons-lang3-3.3.2.jar
>> >> commons-logging-1.1.1.jar
>> >> guava-11.0.2.jar
>> >> hadoop-azure-2.7.3.jar
>> >> httpclient-4.2.5.jar
>> >> httpcore-4.2.4.jar
>> >> jackson-core-2.2.3.jar
>> >> jsr305-1.3.9.jar
>> >> slf4j-api-1.7.5.jar
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Bryan
>> >>
>> >>
>> >> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > Thanks for all the help you've given me so far. Today I'm trying to
>> >> > pull files from an Azure blob store. I've done some reading on this
>> >> > and from previous tickets [1] and guides [2] it seems the
>> >> > recommended approach is to place the required jars, to use the HDFS
>> >> > Azure protocol, in 'Additional Classpath Resoures' and the hadoop
>> >> > core-site and hdfs-site configs into the 'Hadoop Configuration
>> >> > Resources'. I have my local HDFS properly configured to access wasb
>> >> > urls. I'm able to ls,
>> >> copy to and from, etc with out problem.
>> >> > Using the same HDFS config files and trying both all the jars in my
>> >> > hadoop-client/lib directory (hdp) and using the jars recommend in
>> >> > [1] I'm still seeing the "java.lang.IllegalArgumentException: Wrong FS: "
>> >> > error in my NiFi logs and am unable to pull files from Azure blob storage.
>> >> >
>> >> > Interestingly, it seems the processor is spinning up way to fast,
>> >> > the errors appear in the log as soon as I start the processor. I'm
>> >> > not sure how it could be loading all of those jars that quickly.
>> >> >
>> >> > Does anyone have any experience with this or recommendations to try?
>> >> >
>> >> > Thanks,
>> >> > Austin
>> >> >
>> >> > [1] https://issues.apache.org/jira/browse/NIFI-1922
>> >> > [2]
>> >> > https://community.hortonworks.com/articles/71916/connecting-to-azur
>> >> > e-d
>> >> > ata-lake-from-a-nifi-dataflow.html
>> >> >
>> >> >

RE: GetHDFS from Azure Blob

Posted by Giovanni Lanzani <gi...@godatadriven.com>.
Hi Brian,

Thanks for the reply. 

Is there a way to compile NiFi using the Hadoop 2.8.0 libraries?

It's of course unfortunate, but the libraries you mentioned before works in their very specific version. Once you use a newer version (like azure-storage-2.2.0) then things seem to break.

Maybe this jira [^1] could be reopened then? 😊

Cheers,

Giovanni

[^1]: https://issues.apache.org/jira/browse/NIFI-1922

> -----Original Message-----
> From: Bryan Bende [mailto:bbende@gmail.com]
> Sent: Tuesday, April 4, 2017 3:59 PM
> To: users@nifi.apache.org
> Subject: Re: GetHDFS from Azure Blob
> 
> Giovanni,
> 
> I'm not that familiar with using a key provider, but NiFi currently bundles the
> Hadoop 2.7.3 client, and looking at ProviderUtils from 2.7.3, there doesn't
> appear to be a method
> "excludeIncompatibleCredentialProviders":
> 
> https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-
> project/hadoop-
> common/src/main/java/org/apache/hadoop/security/ProviderUtils.java
> 
> It looks like it is introduced in 2.8.0:
> 
> https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-
> project/hadoop-
> common/src/main/java/org/apache/hadoop/security/ProviderUtils.java
> 
> Most likely some code that is present in one of the JARs specified through
> Additional Resources is dependent on Hadoop 2.8.0, and since NiFi is bundling
> 2.7.3, there are some things not lining up.
> 
> -Bryan
> 
> 
> On Tue, Apr 4, 2017 at 9:50 AM, Giovanni Lanzani
> <gi...@godatadriven.com> wrote:
> > Bryan,
> >
> > Allow me to chime in (to ask for help).
> >
> > What about when I'm using an encrypted key?
> >
> > In my case I have (in core-site.xml)
> >
> >    <property>
> >
> <name>fs.azure.account.keyprovider.nsanalyticsstorage.blob.core.windows.ne
> t</name>
> >       <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
> >     </property>
> >
> > Everything works from the command line (hdfs dfs).
> >
> > But NiFi complains with:
> >
> > java.lang.NoSuchMethodError:
> > org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredential
> > Providers
> >
> > Any ideas? I've already linked hadoop-commons.jar as well (besides what you
> suggested below).
> >
> > Cheers,
> >
> > Giovanni
> >
> >
> >> -----Original Message-----
> >> From: Bryan Bende [mailto:bbende@gmail.com]
> >> Sent: Tuesday, March 28, 2017 7:41 PM
> >> To: users@nifi.apache.org
> >> Subject: Re: GetHDFS from Azure Blob
> >>
> >> Austin,
> >>
> >> Can you provide the full error message and stacktrace for  the
> >> IllegalArgumentException from nifi-app.log?
> >>
> >> When you start the processor it creates a FileSystem instance based
> >> on the config files provided to the processor, which in turn causes
> >> all of the corresponding classes to load.
> >>
> >> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
> >> then I have successfully done the following...
> >>
> >> In core-site.xml:
> >>
> >> <configuration>
> >>
> >>     <property>
> >>       <name>fs.defaultFS</name>
> >>       <value>wasb://YOUR_USER@YOUR_HOST/</value>
> >>     </property>
> >>
> >>     <property>
> >>       <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
> >>       <value>YOUR_KEY</value>
> >>     </property>
> >>
> >>     <property>
> >>       <name>fs.AbstractFileSystem.wasb.impl</name>
> >>       <value>org.apache.hadoop.fs.azure.Wasb</value>
> >>     </property>
> >>
> >>     <property>
> >>       <name>fs.wasb.impl</name>
> >>       <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
> >>     </property>
> >>
> >>     <property>
> >>       <name>fs.azure.skip.metrics</name>
> >>       <value>true</value>
> >>     </property>
> >>
> >> </configuration>
> >>
> >> In Additional Resources property of an HDFS processor, point to a
> >> directory
> >> with:
> >>
> >> azure-storage-2.0.0.jar
> >> commons-codec-1.6.jar
> >> commons-lang3-3.3.2.jar
> >> commons-logging-1.1.1.jar
> >> guava-11.0.2.jar
> >> hadoop-azure-2.7.3.jar
> >> httpclient-4.2.5.jar
> >> httpcore-4.2.4.jar
> >> jackson-core-2.2.3.jar
> >> jsr305-1.3.9.jar
> >> slf4j-api-1.7.5.jar
> >>
> >>
> >> Thanks,
> >>
> >> Bryan
> >>
> >>
> >> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
> >> > Hi all,
> >> >
> >> > Thanks for all the help you've given me so far. Today I'm trying to
> >> > pull files from an Azure blob store. I've done some reading on this
> >> > and from previous tickets [1] and guides [2] it seems the
> >> > recommended approach is to place the required jars, to use the HDFS
> >> > Azure protocol, in 'Additional Classpath Resoures' and the hadoop
> >> > core-site and hdfs-site configs into the 'Hadoop Configuration
> >> > Resources'. I have my local HDFS properly configured to access wasb
> >> > urls. I'm able to ls,
> >> copy to and from, etc with out problem.
> >> > Using the same HDFS config files and trying both all the jars in my
> >> > hadoop-client/lib directory (hdp) and using the jars recommend in
> >> > [1] I'm still seeing the "java.lang.IllegalArgumentException: Wrong FS: "
> >> > error in my NiFi logs and am unable to pull files from Azure blob storage.
> >> >
> >> > Interestingly, it seems the processor is spinning up way to fast,
> >> > the errors appear in the log as soon as I start the processor. I'm
> >> > not sure how it could be loading all of those jars that quickly.
> >> >
> >> > Does anyone have any experience with this or recommendations to try?
> >> >
> >> > Thanks,
> >> > Austin
> >> >
> >> > [1] https://issues.apache.org/jira/browse/NIFI-1922
> >> > [2]
> >> > https://community.hortonworks.com/articles/71916/connecting-to-azur
> >> > e-d
> >> > ata-lake-from-a-nifi-dataflow.html
> >> >
> >> >

Re: GetHDFS from Azure Blob

Posted by Bryan Bende <bb...@gmail.com>.
Giovanni,

I'm not that familiar with using a key provider, but NiFi currently
bundles the Hadoop 2.7.3 client, and looking at ProviderUtils from
2.7.3, there doesn't appear to be a method
"excludeIncompatibleCredentialProviders":

https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ProviderUtils.java

It looks like it is introduced in 2.8.0:

https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ProviderUtils.java

Most likely some code that is present in one of the JARs specified
through Additional Resources is dependent on Hadoop 2.8.0, and since
NiFi is bundling 2.7.3, there are some things not lining up.

-Bryan


On Tue, Apr 4, 2017 at 9:50 AM, Giovanni Lanzani
<gi...@godatadriven.com> wrote:
> Bryan,
>
> Allow me to chime in (to ask for help).
>
> What about when I'm using an encrypted key?
>
> In my case I have (in core-site.xml)
>
>    <property>
>       <name>fs.azure.account.keyprovider.nsanalyticsstorage.blob.core.windows.net</name>
>       <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
>     </property>
>
> Everything works from the command line (hdfs dfs).
>
> But NiFi complains with:
>
> java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders
>
> Any ideas? I've already linked hadoop-commons.jar as well (besides what you suggested below).
>
> Cheers,
>
> Giovanni
>
>
>> -----Original Message-----
>> From: Bryan Bende [mailto:bbende@gmail.com]
>> Sent: Tuesday, March 28, 2017 7:41 PM
>> To: users@nifi.apache.org
>> Subject: Re: GetHDFS from Azure Blob
>>
>> Austin,
>>
>> Can you provide the full error message and stacktrace for  the
>> IllegalArgumentException from nifi-app.log?
>>
>> When you start the processor it creates a FileSystem instance based on the
>> config files provided to the processor, which in turn causes all of the
>> corresponding classes to load.
>>
>> I'm not that familiar with Azure, but if "Azure blob store" is WASB, then I have
>> successfully done the following...
>>
>> In core-site.xml:
>>
>> <configuration>
>>
>>     <property>
>>       <name>fs.defaultFS</name>
>>       <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>     </property>
>>
>>     <property>
>>       <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>       <value>YOUR_KEY</value>
>>     </property>
>>
>>     <property>
>>       <name>fs.AbstractFileSystem.wasb.impl</name>
>>       <value>org.apache.hadoop.fs.azure.Wasb</value>
>>     </property>
>>
>>     <property>
>>       <name>fs.wasb.impl</name>
>>       <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>     </property>
>>
>>     <property>
>>       <name>fs.azure.skip.metrics</name>
>>       <value>true</value>
>>     </property>
>>
>> </configuration>
>>
>> In Additional Resources property of an HDFS processor, point to a directory
>> with:
>>
>> azure-storage-2.0.0.jar
>> commons-codec-1.6.jar
>> commons-lang3-3.3.2.jar
>> commons-logging-1.1.1.jar
>> guava-11.0.2.jar
>> hadoop-azure-2.7.3.jar
>> httpclient-4.2.5.jar
>> httpcore-4.2.4.jar
>> jackson-core-2.2.3.jar
>> jsr305-1.3.9.jar
>> slf4j-api-1.7.5.jar
>>
>>
>> Thanks,
>>
>> Bryan
>>
>>
>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
>> > Hi all,
>> >
>> > Thanks for all the help you've given me so far. Today I'm trying to
>> > pull files from an Azure blob store. I've done some reading on this
>> > and from previous tickets [1] and guides [2] it seems the recommended
>> > approach is to place the required jars, to use the HDFS Azure
>> > protocol, in 'Additional Classpath Resoures' and the hadoop core-site
>> > and hdfs-site configs into the 'Hadoop Configuration Resources'. I
>> > have my local HDFS properly configured to access wasb urls. I'm able to ls,
>> copy to and from, etc with out problem.
>> > Using the same HDFS config files and trying both all the jars in my
>> > hadoop-client/lib directory (hdp) and using the jars recommend in [1]
>> > I'm still seeing the "java.lang.IllegalArgumentException: Wrong FS: "
>> > error in my NiFi logs and am unable to pull files from Azure blob storage.
>> >
>> > Interestingly, it seems the processor is spinning up way to fast, the
>> > errors appear in the log as soon as I start the processor. I'm not
>> > sure how it could be loading all of those jars that quickly.
>> >
>> > Does anyone have any experience with this or recommendations to try?
>> >
>> > Thanks,
>> > Austin
>> >
>> > [1] https://issues.apache.org/jira/browse/NIFI-1922
>> > [2]
>> > https://community.hortonworks.com/articles/71916/connecting-to-azure-d
>> > ata-lake-from-a-nifi-dataflow.html
>> >
>> >

RE: GetHDFS from Azure Blob

Posted by Giovanni Lanzani <gi...@godatadriven.com>.
Bryan, 

Allow me to chime in (to ask for help). 

What about when I'm using an encrypted key?

In my case I have (in core-site.xml)

   <property>
      <name>fs.azure.account.keyprovider.nsanalyticsstorage.blob.core.windows.net</name>
      <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
    </property>

Everything works from the command line (hdfs dfs).

But NiFi complains with:

java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders

Any ideas? I've already linked hadoop-commons.jar as well (besides what you suggested below).

Cheers,

Giovanni


> -----Original Message-----
> From: Bryan Bende [mailto:bbende@gmail.com]
> Sent: Tuesday, March 28, 2017 7:41 PM
> To: users@nifi.apache.org
> Subject: Re: GetHDFS from Azure Blob
> 
> Austin,
> 
> Can you provide the full error message and stacktrace for  the
> IllegalArgumentException from nifi-app.log?
> 
> When you start the processor it creates a FileSystem instance based on the
> config files provided to the processor, which in turn causes all of the
> corresponding classes to load.
> 
> I'm not that familiar with Azure, but if "Azure blob store" is WASB, then I have
> successfully done the following...
> 
> In core-site.xml:
> 
> <configuration>
> 
>     <property>
>       <name>fs.defaultFS</name>
>       <value>wasb://YOUR_USER@YOUR_HOST/</value>
>     </property>
> 
>     <property>
>       <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>       <value>YOUR_KEY</value>
>     </property>
> 
>     <property>
>       <name>fs.AbstractFileSystem.wasb.impl</name>
>       <value>org.apache.hadoop.fs.azure.Wasb</value>
>     </property>
> 
>     <property>
>       <name>fs.wasb.impl</name>
>       <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>     </property>
> 
>     <property>
>       <name>fs.azure.skip.metrics</name>
>       <value>true</value>
>     </property>
> 
> </configuration>
> 
> In Additional Resources property of an HDFS processor, point to a directory
> with:
> 
> azure-storage-2.0.0.jar
> commons-codec-1.6.jar
> commons-lang3-3.3.2.jar
> commons-logging-1.1.1.jar
> guava-11.0.2.jar
> hadoop-azure-2.7.3.jar
> httpclient-4.2.5.jar
> httpcore-4.2.4.jar
> jackson-core-2.2.3.jar
> jsr305-1.3.9.jar
> slf4j-api-1.7.5.jar
> 
> 
> Thanks,
> 
> Bryan
> 
> 
> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
> > Hi all,
> >
> > Thanks for all the help you've given me so far. Today I'm trying to
> > pull files from an Azure blob store. I've done some reading on this
> > and from previous tickets [1] and guides [2] it seems the recommended
> > approach is to place the required jars, to use the HDFS Azure
> > protocol, in 'Additional Classpath Resoures' and the hadoop core-site
> > and hdfs-site configs into the 'Hadoop Configuration Resources'. I
> > have my local HDFS properly configured to access wasb urls. I'm able to ls,
> copy to and from, etc with out problem.
> > Using the same HDFS config files and trying both all the jars in my
> > hadoop-client/lib directory (hdp) and using the jars recommend in [1]
> > I'm still seeing the "java.lang.IllegalArgumentException: Wrong FS: "
> > error in my NiFi logs and am unable to pull files from Azure blob storage.
> >
> > Interestingly, it seems the processor is spinning up way to fast, the
> > errors appear in the log as soon as I start the processor. I'm not
> > sure how it could be loading all of those jars that quickly.
> >
> > Does anyone have any experience with this or recommendations to try?
> >
> > Thanks,
> > Austin
> >
> > [1] https://issues.apache.org/jira/browse/NIFI-1922
> > [2]
> > https://community.hortonworks.com/articles/71916/connecting-to-azure-d
> > ata-lake-from-a-nifi-dataflow.html
> >
> >

Re: GetHDFS from Azure Blob

Posted by Bryan Bende <bb...@gmail.com>.
Austin,

Can you provide the full error message and stacktrace for  the
IllegalArgumentException from nifi-app.log?

When you start the processor it creates a FileSystem instance based on
the config files provided to the processor, which in turn causes all
of the corresponding classes to load.

I'm not that familiar with Azure, but if "Azure blob store" is WASB,
then I have successfully done the following...

In core-site.xml:

<configuration>

    <property>
      <name>fs.defaultFS</name>
      <value>wasb://YOUR_USER@YOUR_HOST/</value>
    </property>

    <property>
      <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
      <value>YOUR_KEY</value>
    </property>

    <property>
      <name>fs.AbstractFileSystem.wasb.impl</name>
      <value>org.apache.hadoop.fs.azure.Wasb</value>
    </property>

    <property>
      <name>fs.wasb.impl</name>
      <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
    </property>

    <property>
      <name>fs.azure.skip.metrics</name>
      <value>true</value>
    </property>

</configuration>

In Additional Resources property of an HDFS processor, point to a
directory with:

azure-storage-2.0.0.jar
commons-codec-1.6.jar
commons-lang3-3.3.2.jar
commons-logging-1.1.1.jar
guava-11.0.2.jar
hadoop-azure-2.7.3.jar
httpclient-4.2.5.jar
httpcore-4.2.4.jar
jackson-core-2.2.3.jar
jsr305-1.3.9.jar
slf4j-api-1.7.5.jar


Thanks,

Bryan


On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <ah...@ccri.com> wrote:
> Hi all,
>
> Thanks for all the help you've given me so far. Today I'm trying to pull
> files from an Azure blob store. I've done some reading on this and from
> previous tickets [1] and guides [2] it seems the recommended approach is to
> place the required jars, to use the HDFS Azure protocol, in 'Additional
> Classpath Resoures' and the hadoop core-site and hdfs-site configs into the
> 'Hadoop Configuration Resources'. I have my local HDFS properly configured
> to access wasb urls. I'm able to ls, copy to and from, etc with out problem.
> Using the same HDFS config files and trying both all the jars in my
> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm
> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error in
> my NiFi logs and am unable to pull files from Azure blob storage.
>
> Interestingly, it seems the processor is spinning up way to fast, the errors
> appear in the log as soon as I start the processor. I'm not sure how it
> could be loading all of those jars that quickly.
>
> Does anyone have any experience with this or recommendations to try?
>
> Thanks,
> Austin
>
> [1] https://issues.apache.org/jira/browse/NIFI-1922
> [2]
> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>
>