You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Martini, Adam" <Ad...@nike.com> on 2018/02/09 23:21:24 UTC

NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

Hello NiFi Dev Community,

This commit hash (part of the NiFi 1.5.0 release) created serious performance issues for HBase Put operations: "116c8463428c1fb51bfb7a8adfcf23c32fded964".

The override of the “toTransitUri” method makes a call to “connection.getAdmin().getClusterStatus().getMaster().getHostAndPort()” upon every flow file transfer, which essentially doubles the traffic through the HBase connector.  The performance of our PutHBaseJSON processor dropped to 1/3 after deploying NiFi 1.5.0.

Please let us know a timeline for a fix.  We are building and testing our own tar ball in the interim to fix the issue and are happy to contribute our code back to the project if you would like.

All the best and thank you.

Adam Martini
Senior Developer, Nike Digital



Re: Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

Posted by "Martini, Adam" <Ad...@nike.com>.
Hi Koji and all other who responded,

Thanks for getting this PR out so quickly.  Your original response chain was sent to the CC address (Maxwell Eng) so I only now got this email.

I made on comment on your PR, but otw it was the same fix I made to the AMI we are testing now with great performance.

Thanks again!

Adam

On 2/12/18, 2:16 PM, "Eng, Maxwell" <Ma...@nike.com> wrote:

    
    
    On 2/9/18, 7:29 PM, "Koji Kawamura" <ij...@gmail.com> wrote:
    
        Hi,
        
        The PR is ready for review. I confirmed that performance issue is addressed.
        https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_nifi_pull_2464&d=DwIFaQ&c=7DfhQjPWzR3PmWBQVpi-kw&r=NutwvZ9ElvhWvhi1rBJ0mGwi4rVfiCUP7ys98E-FZMk&m=DShcVudbDkrAz_aYA4C4uklUFAqfg8A6UmvK2Y3-Nv8&s=awjLBwtc0H5JFLdSB_3R-Acp5xEwRvrhnk0WRMQH3uk&e=
        
        I was also testing to see if the
        nifi-hbase_1_1_2-client-service-nar-1.6.0-SNAPSHOT.nar can be used in
        NiFi 1.5.0 env. But unfortunately it doesn't seem we can put it as it
        is.
        A validation error occurs saying, 'HBase_1_1_2_ClientService
        -1.6.0-SNAPSHOT from org.apache.nifi -
        nifi-hbase_1_1_2-client-service-nar is not compatible with
        HBaseClientService -1.5.0 from org.apache.nifi -
        nifi-standard-services-api-nar'.
        It looks like nifi-standard-services needs to be updated, too, but I
        think that's a bit risky, it may affect other services.
        
        So, I've wrote a Gist to work around this, with
        nifi-hbase_1_1_2-client-service-nar-1.5.0_nifi-4866.nar built with
        1.5.0 released commit with cherry-picked performance fix.
        https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_ijokarumawak_85db60ca71f1825f543c18c62bf7c3fd&d=DwIFaQ&c=7DfhQjPWzR3PmWBQVpi-kw&r=NutwvZ9ElvhWvhi1rBJ0mGwi4rVfiCUP7ys98E-FZMk&m=DShcVudbDkrAz_aYA4C4uklUFAqfg8A6UmvK2Y3-Nv8&s=1-xmJiTPwyx15tjdJdDzNM7W2amJTzkovP2OuP10b08&e=
        
        Thanks,
        Koji
        
        
        
        On Sat, Feb 10, 2018 at 10:37 AM, Koji Kawamura <ij...@gmail.com> wrote:
        > Hi Adam,
        >
        > Thank you very much for reporting the performance issue.
        > I created NIFI-4866 and started fixing the issue by moving the
        > problematic code block to createConnection.
        > After confirming that addresses performance issue, I will send a PR to
        > get it merged.
        >
        > Koji
        >
        >
        > On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt <jo...@gmail.com> wrote:
        >> adam
        >>
        >> you should also be able to put the old hbase nar in and switch to that
        >> version.
        >>
        >> we now support multiple versions of the same component.
        >>
        >> thanks
        >>
        >> On Feb 9, 2018 7:10 PM, "Mike Thomsen" <mi...@gmail.com> wrote:
        >>
        >>> Adam,
        >>>
        >>> If you're doing bulk ingestion of JSON, I would recommend using
        >>> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
        >>> limitations doing genomic data ingestion (several 10s of billions of Puts
        >>> from the 1000 genomes project). If you run into problems with it, just post
        >>> them and poke me.
        >>>
        >>> Mike
        >>>
        >>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt <jo...@gmail.com> wrote:
        >>>
        >>> > adam
        >>> >
        >>> > thanks for reporting and if you can do a contrib that would be great!
        >>> >
        >>> > thanks
        >>> > joe
        >>> >
        >>> > On Feb 9, 2018 6:56 PM, "Martini, Adam" <Ad...@nike.com> wrote:
        >>> >
        >>> > > Hello NiFi Dev Community,
        >>> > >
        >>> > > This commit hash (part of the NiFi 1.5.0 release) created serious
        >>> > > performance issues for HBase Put operations: "
        >>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
        >>> > >
        >>> > > The override of the “toTransitUri” method makes a call to
        >>> > > “connection.getAdmin().getClusterStatus().getMaster()
        >>> .getHostAndPort()”
        >>> > > upon every flow file transfer, which essentially doubles the traffic
        >>> > > through the HBase connector.  The performance of our PutHBaseJSON
        >>> > processor
        >>> > > dropped to 1/3 after deploying NiFi 1.5.0.
        >>> > >
        >>> > > Please let us know a timeline for a fix.  We are building and testing
        >>> our
        >>> > > own tar ball in the interim to fix the issue and are happy to
        >>> contribute
        >>> > > our code back to the project if you would like.
        >>> > >
        >>> > > All the best and thank you.
        >>> > >
        >>> > > Adam Martini
        >>> > > Senior Developer, Nike Digital
        >>> > >
        >>> > >
        >>> > >
        >>> >
        >>>
        
    
    


Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

Posted by Koji Kawamura <ij...@gmail.com>.
Hi,

The PR is ready for review. I confirmed that performance issue is addressed.
https://github.com/apache/nifi/pull/2464

I was also testing to see if the
nifi-hbase_1_1_2-client-service-nar-1.6.0-SNAPSHOT.nar can be used in
NiFi 1.5.0 env. But unfortunately it doesn't seem we can put it as it
is.
A validation error occurs saying, 'HBase_1_1_2_ClientService
-1.6.0-SNAPSHOT from org.apache.nifi -
nifi-hbase_1_1_2-client-service-nar is not compatible with
HBaseClientService -1.5.0 from org.apache.nifi -
nifi-standard-services-api-nar'.
It looks like nifi-standard-services needs to be updated, too, but I
think that's a bit risky, it may affect other services.

So, I've wrote a Gist to work around this, with
nifi-hbase_1_1_2-client-service-nar-1.5.0_nifi-4866.nar built with
1.5.0 released commit with cherry-picked performance fix.
https://gist.github.com/ijokarumawak/85db60ca71f1825f543c18c62bf7c3fd

Thanks,
Koji



On Sat, Feb 10, 2018 at 10:37 AM, Koji Kawamura <ij...@gmail.com> wrote:
> Hi Adam,
>
> Thank you very much for reporting the performance issue.
> I created NIFI-4866 and started fixing the issue by moving the
> problematic code block to createConnection.
> After confirming that addresses performance issue, I will send a PR to
> get it merged.
>
> Koji
>
>
> On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt <jo...@gmail.com> wrote:
>> adam
>>
>> you should also be able to put the old hbase nar in and switch to that
>> version.
>>
>> we now support multiple versions of the same component.
>>
>> thanks
>>
>> On Feb 9, 2018 7:10 PM, "Mike Thomsen" <mi...@gmail.com> wrote:
>>
>>> Adam,
>>>
>>> If you're doing bulk ingestion of JSON, I would recommend using
>>> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
>>> limitations doing genomic data ingestion (several 10s of billions of Puts
>>> from the 1000 genomes project). If you run into problems with it, just post
>>> them and poke me.
>>>
>>> Mike
>>>
>>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt <jo...@gmail.com> wrote:
>>>
>>> > adam
>>> >
>>> > thanks for reporting and if you can do a contrib that would be great!
>>> >
>>> > thanks
>>> > joe
>>> >
>>> > On Feb 9, 2018 6:56 PM, "Martini, Adam" <Ad...@nike.com> wrote:
>>> >
>>> > > Hello NiFi Dev Community,
>>> > >
>>> > > This commit hash (part of the NiFi 1.5.0 release) created serious
>>> > > performance issues for HBase Put operations: "
>>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>>> > >
>>> > > The override of the “toTransitUri” method makes a call to
>>> > > “connection.getAdmin().getClusterStatus().getMaster()
>>> .getHostAndPort()”
>>> > > upon every flow file transfer, which essentially doubles the traffic
>>> > > through the HBase connector.  The performance of our PutHBaseJSON
>>> > processor
>>> > > dropped to 1/3 after deploying NiFi 1.5.0.
>>> > >
>>> > > Please let us know a timeline for a fix.  We are building and testing
>>> our
>>> > > own tar ball in the interim to fix the issue and are happy to
>>> contribute
>>> > > our code back to the project if you would like.
>>> > >
>>> > > All the best and thank you.
>>> > >
>>> > > Adam Martini
>>> > > Senior Developer, Nike Digital
>>> > >
>>> > >
>>> > >
>>> >
>>>

Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Adam,

Thank you very much for reporting the performance issue.
I created NIFI-4866 and started fixing the issue by moving the
problematic code block to createConnection.
After confirming that addresses performance issue, I will send a PR to
get it merged.

Koji


On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt <jo...@gmail.com> wrote:
> adam
>
> you should also be able to put the old hbase nar in and switch to that
> version.
>
> we now support multiple versions of the same component.
>
> thanks
>
> On Feb 9, 2018 7:10 PM, "Mike Thomsen" <mi...@gmail.com> wrote:
>
>> Adam,
>>
>> If you're doing bulk ingestion of JSON, I would recommend using
>> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
>> limitations doing genomic data ingestion (several 10s of billions of Puts
>> from the 1000 genomes project). If you run into problems with it, just post
>> them and poke me.
>>
>> Mike
>>
>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>> > adam
>> >
>> > thanks for reporting and if you can do a contrib that would be great!
>> >
>> > thanks
>> > joe
>> >
>> > On Feb 9, 2018 6:56 PM, "Martini, Adam" <Ad...@nike.com> wrote:
>> >
>> > > Hello NiFi Dev Community,
>> > >
>> > > This commit hash (part of the NiFi 1.5.0 release) created serious
>> > > performance issues for HBase Put operations: "
>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>> > >
>> > > The override of the “toTransitUri” method makes a call to
>> > > “connection.getAdmin().getClusterStatus().getMaster()
>> .getHostAndPort()”
>> > > upon every flow file transfer, which essentially doubles the traffic
>> > > through the HBase connector.  The performance of our PutHBaseJSON
>> > processor
>> > > dropped to 1/3 after deploying NiFi 1.5.0.
>> > >
>> > > Please let us know a timeline for a fix.  We are building and testing
>> our
>> > > own tar ball in the interim to fix the issue and are happy to
>> contribute
>> > > our code back to the project if you would like.
>> > >
>> > > All the best and thank you.
>> > >
>> > > Adam Martini
>> > > Senior Developer, Nike Digital
>> > >
>> > >
>> > >
>> >
>>

Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

Posted by Joe Witt <jo...@gmail.com>.
adam

you should also be able to put the old hbase nar in and switch to that
version.

we now support multiple versions of the same component.

thanks

On Feb 9, 2018 7:10 PM, "Mike Thomsen" <mi...@gmail.com> wrote:

> Adam,
>
> If you're doing bulk ingestion of JSON, I would recommend using
> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
> limitations doing genomic data ingestion (several 10s of billions of Puts
> from the 1000 genomes project). If you run into problems with it, just post
> them and poke me.
>
> Mike
>
> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt <jo...@gmail.com> wrote:
>
> > adam
> >
> > thanks for reporting and if you can do a contrib that would be great!
> >
> > thanks
> > joe
> >
> > On Feb 9, 2018 6:56 PM, "Martini, Adam" <Ad...@nike.com> wrote:
> >
> > > Hello NiFi Dev Community,
> > >
> > > This commit hash (part of the NiFi 1.5.0 release) created serious
> > > performance issues for HBase Put operations: "
> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
> > >
> > > The override of the “toTransitUri” method makes a call to
> > > “connection.getAdmin().getClusterStatus().getMaster()
> .getHostAndPort()”
> > > upon every flow file transfer, which essentially doubles the traffic
> > > through the HBase connector.  The performance of our PutHBaseJSON
> > processor
> > > dropped to 1/3 after deploying NiFi 1.5.0.
> > >
> > > Please let us know a timeline for a fix.  We are building and testing
> our
> > > own tar ball in the interim to fix the issue and are happy to
> contribute
> > > our code back to the project if you would like.
> > >
> > > All the best and thank you.
> > >
> > > Adam Martini
> > > Senior Developer, Nike Digital
> > >
> > >
> > >
> >
>

Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

Posted by Mike Thomsen <mi...@gmail.com>.
Adam,

If you're doing bulk ingestion of JSON, I would recommend using
PutHBaseRecord. I wrote it/contributed it when my team ran into similar
limitations doing genomic data ingestion (several 10s of billions of Puts
from the 1000 genomes project). If you run into problems with it, just post
them and poke me.

Mike

On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt <jo...@gmail.com> wrote:

> adam
>
> thanks for reporting and if you can do a contrib that would be great!
>
> thanks
> joe
>
> On Feb 9, 2018 6:56 PM, "Martini, Adam" <Ad...@nike.com> wrote:
>
> > Hello NiFi Dev Community,
> >
> > This commit hash (part of the NiFi 1.5.0 release) created serious
> > performance issues for HBase Put operations: "
> > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
> >
> > The override of the “toTransitUri” method makes a call to
> > “connection.getAdmin().getClusterStatus().getMaster().getHostAndPort()”
> > upon every flow file transfer, which essentially doubles the traffic
> > through the HBase connector.  The performance of our PutHBaseJSON
> processor
> > dropped to 1/3 after deploying NiFi 1.5.0.
> >
> > Please let us know a timeline for a fix.  We are building and testing our
> > own tar ball in the interim to fix the issue and are happy to contribute
> > our code back to the project if you would like.
> >
> > All the best and thank you.
> >
> > Adam Martini
> > Senior Developer, Nike Digital
> >
> >
> >
>

Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

Posted by Joe Witt <jo...@gmail.com>.
adam

thanks for reporting and if you can do a contrib that would be great!

thanks
joe

On Feb 9, 2018 6:56 PM, "Martini, Adam" <Ad...@nike.com> wrote:

> Hello NiFi Dev Community,
>
> This commit hash (part of the NiFi 1.5.0 release) created serious
> performance issues for HBase Put operations: "
> 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>
> The override of the “toTransitUri” method makes a call to
> “connection.getAdmin().getClusterStatus().getMaster().getHostAndPort()”
> upon every flow file transfer, which essentially doubles the traffic
> through the HBase connector.  The performance of our PutHBaseJSON processor
> dropped to 1/3 after deploying NiFi 1.5.0.
>
> Please let us know a timeline for a fix.  We are building and testing our
> own tar ball in the interim to fix the issue and are happy to contribute
> our code back to the project if you would like.
>
> All the best and thank you.
>
> Adam Martini
> Senior Developer, Nike Digital
>
>
>