You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Milan Das <md...@interset.com> on 2018/03/16 14:39:30 UTC

Finding Performance bottleneck issue

I have a custom processor, it works as expected. But I feel there is some performance measure need to be done. I see that my processor is actually queuing up  records at source.

Is there a run a load  test and do performance measure for Custom Processor?

 

Regards,

Milan Das


Re: Finding Performance bottleneck issue

Posted by Milan Das <md...@interset.com>.
Thanks for your help Mike Thomsen & Joe Witt.
Performance problem was in the way I was each line from flow input. My logic was : io.InputStream->io.BufferedReader-> while reader.readLine() loop.
Now I am using guava library.
I have changed the code to use “String inputString = CharStreams.toString(new InputStreamReader(in, "UTF-8"));”
I gained 20 times  performance.


Regards,
Milan Das





On 3/16/18, 12:07 PM, "Mike Thomsen" <mi...@gmail.com> wrote:

    That seems like a very reasonable use case. You said:
    
    > I see that my processor is actually queuing up  records at source.
    
    Are you saying that the processor isn't able to process them that quickly
    such that you're seeing a big backlog in the input queue?
    
    On Fri, Mar 16, 2018 at 11:56 AM, Milan Das <md...@interset.com> wrote:
    
    > Hi Mike,
    > My processor is processing windows Text event as below and creating a JSON
    > out of it.
    > Also I am applying simple JoltTransformer (Simple just Shift and Default)
    > to convert to different  JSON (no hierarchy) .
    >
    > Output have the following:
    > 1. Original text
    > 2. Converted JSON
    > 3. JOLT transformed JSON
    > 4. Failure
    >
    >
    > Steps in program:
    > 1. Converting the event to Java Map (using regex: "([^:=]*)[:=]([^:=]*)")
    > 2. Map to Json (using Gson)
    > 3. Jolt transfeormation
    >
    >
    >
    > Example event:
    >
    > Examples of 4626
    > User / Device claims information.
    >
    > Subject:
    >     Security ID:     %1
    >     Account Name:    %2
    >     Account Domain:  %3
    >     Logon ID:        %4
    >
    > Logon Type:          %9
    >
    > New Logon:
    >     Security ID:     %5
    >     Account Name:    %6
    >     Account Domain:  %7
    >     Logon ID:        %8
    >
    > Event in sequence:   %10 of %11
    >
    > User Claims:         %12
    >
    > Device Claims:       %13
    >
    > The subject fields indicate the account on the local system which
    > requested the logon. This is most commonly a service such as the Server
    > service, or a local process such as Winlogon.exe or Services.exe.
    >
    > The logon type field indicates the kind of logon that occurred. The most
    > common types are 2 (interactive) and 3 (network).
    >
    > The New Logon fields indicate the account for whom the new logon was
    > created, i.e. the account that was logged on.
    >
    > This event is generated when the Audit User/Device claims subcategory is
    > configured and the user’s logon token contains user/device claims
    > information. The Logon ID field can be used to correlate this event with
    > the corresponding user logon event as well as to any other security audit
    > events generated during this logon session.
    >
    >
    >
    > Regards,
    > Milan Das
    >
    >
    > On 3/16/18, 10:56 AM, "Mike Thomsen" <mi...@gmail.com> wrote:
    >
    >     Milan,
    >
    >     Can you share some details about where you are running into problems?
    > Like
    >     a basic description of what it's trying to do?
    >
    >     On Fri, Mar 16, 2018 at 10:39 AM, Milan Das <md...@interset.com> wrote:
    >
    >     > I have a custom processor, it works as expected. But I feel there is
    > some
    >     > performance measure need to be done. I see that my processor is
    > actually
    >     > queuing up  records at source.
    >     >
    >     > Is there a run a load  test and do performance measure for Custom
    >     > Processor?
    >     >
    >     >
    >     >
    >     > Regards,
    >     >
    >     > Milan Das
    >     >
    >     >
    >
    >
    >
    >
    



Re: Finding Performance bottleneck issue

Posted by Mike Thomsen <mi...@gmail.com>.
That seems like a very reasonable use case. You said:

> I see that my processor is actually queuing up  records at source.

Are you saying that the processor isn't able to process them that quickly
such that you're seeing a big backlog in the input queue?

On Fri, Mar 16, 2018 at 11:56 AM, Milan Das <md...@interset.com> wrote:

> Hi Mike,
> My processor is processing windows Text event as below and creating a JSON
> out of it.
> Also I am applying simple JoltTransformer (Simple just Shift and Default)
> to convert to different  JSON (no hierarchy) .
>
> Output have the following:
> 1. Original text
> 2. Converted JSON
> 3. JOLT transformed JSON
> 4. Failure
>
>
> Steps in program:
> 1. Converting the event to Java Map (using regex: "([^:=]*)[:=]([^:=]*)")
> 2. Map to Json (using Gson)
> 3. Jolt transfeormation
>
>
>
> Example event:
>
> Examples of 4626
> User / Device claims information.
>
> Subject:
>     Security ID:     %1
>     Account Name:    %2
>     Account Domain:  %3
>     Logon ID:        %4
>
> Logon Type:          %9
>
> New Logon:
>     Security ID:     %5
>     Account Name:    %6
>     Account Domain:  %7
>     Logon ID:        %8
>
> Event in sequence:   %10 of %11
>
> User Claims:         %12
>
> Device Claims:       %13
>
> The subject fields indicate the account on the local system which
> requested the logon. This is most commonly a service such as the Server
> service, or a local process such as Winlogon.exe or Services.exe.
>
> The logon type field indicates the kind of logon that occurred. The most
> common types are 2 (interactive) and 3 (network).
>
> The New Logon fields indicate the account for whom the new logon was
> created, i.e. the account that was logged on.
>
> This event is generated when the Audit User/Device claims subcategory is
> configured and the user’s logon token contains user/device claims
> information. The Logon ID field can be used to correlate this event with
> the corresponding user logon event as well as to any other security audit
> events generated during this logon session.
>
>
>
> Regards,
> Milan Das
>
>
> On 3/16/18, 10:56 AM, "Mike Thomsen" <mi...@gmail.com> wrote:
>
>     Milan,
>
>     Can you share some details about where you are running into problems?
> Like
>     a basic description of what it's trying to do?
>
>     On Fri, Mar 16, 2018 at 10:39 AM, Milan Das <md...@interset.com> wrote:
>
>     > I have a custom processor, it works as expected. But I feel there is
> some
>     > performance measure need to be done. I see that my processor is
> actually
>     > queuing up  records at source.
>     >
>     > Is there a run a load  test and do performance measure for Custom
>     > Processor?
>     >
>     >
>     >
>     > Regards,
>     >
>     > Milan Das
>     >
>     >
>
>
>
>

Re: Finding Performance bottleneck issue

Posted by Milan Das <md...@interset.com>.
Hi Mike,
My processor is processing windows Text event as below and creating a JSON out of it.
Also I am applying simple JoltTransformer (Simple just Shift and Default) to convert to different  JSON (no hierarchy) .

Output have the following:
1. Original text
2. Converted JSON
3. JOLT transformed JSON
4. Failure


Steps in program:
1. Converting the event to Java Map (using regex: "([^:=]*)[:=]([^:=]*)")
2. Map to Json (using Gson)
3. Jolt transfeormation



Example event:

Examples of 4626
User / Device claims information.

Subject:
    Security ID:     %1
    Account Name:    %2
    Account Domain:  %3
    Logon ID:        %4

Logon Type:          %9

New Logon:
    Security ID:     %5
    Account Name:    %6
    Account Domain:  %7
    Logon ID:        %8

Event in sequence:   %10 of %11

User Claims:         %12

Device Claims:       %13

The subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.

The logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).

The New Logon fields indicate the account for whom the new logon was created, i.e. the account that was logged on.

This event is generated when the Audit User/Device claims subcategory is configured and the user’s logon token contains user/device claims information. The Logon ID field can be used to correlate this event with the corresponding user logon event as well as to any other security audit events generated during this logon session.



Regards,
Milan Das


On 3/16/18, 10:56 AM, "Mike Thomsen" <mi...@gmail.com> wrote:

    Milan,
    
    Can you share some details about where you are running into problems? Like
    a basic description of what it's trying to do?
    
    On Fri, Mar 16, 2018 at 10:39 AM, Milan Das <md...@interset.com> wrote:
    
    > I have a custom processor, it works as expected. But I feel there is some
    > performance measure need to be done. I see that my processor is actually
    > queuing up  records at source.
    >
    > Is there a run a load  test and do performance measure for Custom
    > Processor?
    >
    >
    >
    > Regards,
    >
    > Milan Das
    >
    >
    



Re: Finding Performance bottleneck issue

Posted by Mike Thomsen <mi...@gmail.com>.
Milan,

Can you share some details about where you are running into problems? Like
a basic description of what it's trying to do?

On Fri, Mar 16, 2018 at 10:39 AM, Milan Das <md...@interset.com> wrote:

> I have a custom processor, it works as expected. But I feel there is some
> performance measure need to be done. I see that my processor is actually
> queuing up  records at source.
>
> Is there a run a load  test and do performance measure for Custom
> Processor?
>
>
>
> Regards,
>
> Milan Das
>
>

Re: Finding Performance bottleneck issue

Posted by Joe Witt <jo...@gmail.com>.
Milan,

We dont offer any load testing specific tooling.  You can though write
integration tests which setup a more complete system.  The most common
model i see people do though for true performance/bottleneck analysis
though is setup a running nifi instance with a flow through your
custom processor and attach a profiler at runtime.

This is easy to do by editing conf/bootstrap.conf

#java.arg.debug=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000

change that line to

#java.arg.debug=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000

Then with your running nifi you can attach something like JVisualVM
from the JDK which is a really great tool.  You could look at more
powerful options like Yourkit as well.

These let you really learn where the bottlenecks are.

You might also want to look at the existing library of 200+ processors
code and see if any have similar characteristics to yours.

Design things:
1) Does how your processor works lend to idempotent (safe to do over
and over) operation?  If yes you can enable session batching.  This
means instead of one flowfile at a time we can do hundreds or
thousands and merge them into a single flowfile repo transaction
2) Are you using a lot of heap/memory?  This can be if you're reading
content fully into memory or worse doing array copies, etc..

If you need more help and pointers please share more details on what
the processor is doing/the design of it as there are many folks on
this list which can give general pointers.

Thanks

On Fri, Mar 16, 2018 at 2:39 PM, Milan Das <md...@interset.com> wrote:
> I have a custom processor, it works as expected. But I feel there is some performance measure need to be done. I see that my processor is actually queuing up  records at source.
>
> Is there a run a load  test and do performance measure for Custom Processor?
>
>
>
> Regards,
>
> Milan Das
>