You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by "Kumar, Deeptaanshu" <De...@capitalone.com> on 2016/05/17 15:14:18 UTC

[METRON-161] AD Integration Test Question

Hi Metron Team,

I am working on the Active Directory parser, and I have a question about the integration tests. Active Directory logs are multi-line logs, and currently, the Metron integration tests are configured to handle single-line logs so the integration tests fail for Active Directory. How would you recommend that I proceed with the integration tests for Active Directory logs? Should I modify code in the ParserIntegrationTest.java file to accommodate for multi-line logs?

Sincerely,

Deeptaanshu Kumar
EDS - ISRM
Data Engineer
Deeptaanshu.Kumar@CapitalOne.com<ma...@CapitalOne.com>
[cid:B975041C-5EA9-4D43-9552-2DCED2D0C008]
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: [METRON-161] AD Integration Test Question

Posted by "Kumar, Deeptaanshu" <De...@capitalone.com>.
Hi Metron Team,

Thanks for the suggestion Ryan! I copied the readSampleData() method from
TestUtils.java to ParserIntegrationTest.java, and then made my
ActiveDirectoryIntegration class override it. After incorporating this
suggestion, I was able to pass the integration tests for Active Directory.
I also made the same changes for the Windows Syslog data source (also a
multi-line data source), and it passed the integration tests as well.

Sincerely,

Deeptaanshu Kumar
EDS ­ ISRM 
Data Engineer
Deeptaanshu.Kumar@CapitalOne.com





On 5/17/16, 1:31 PM, "Ryan Merriman" <rm...@hortonworks.com> wrote:

>Here is my suggestion:
>
>1. Create a method in ParserIntegrationTest:  public List<byte[]>
>readSampleData()
>
>2. Move TestUtils.readSampleData into that method
>
>3. Override that method in the AD integration test to return the proper AD
>sample data
>
>On 5/17/16, 9:36 AM, "Kumar, Deeptaanshu"
><De...@capitalone.com> wrote:
>
>>Hi Metron Team,
>>
>>I misspoke earlier when I said the AD logs span multiple Kafka records. I
>>meant to say that the way the Metron integration tests are currently
>>setup, each line in the AD log is being treated as a separate Kafka
>>record. I took a look at the code again and the readSampleData() method
>>in
>>TestUtils.java is reading each line in the AD log as a separate log. From
>>here, the writeMessages() method in KafkaWithZKComponent.java is writing
>>each line of the AD log to a different Kafka producer. If we could add
>>code in either of these classes to handle multi-line logs, we would be
>>able to fix this issue.
>>
>>I can join the AD records into a single line in my test logs, however, I
>>will need to change the AD parser to handle one-line AD logs. Once I do
>>that, the parser will pass the integration tests but will fail in
>>production where the logs will be multi-line, not single-line. Jonathon
>>Striley is correct, Nifi is configured to pass the entire multi-line AD
>>log as one record to Kafka, which is why this parser is currently working
>>in production. 
>>
>>I just saw Ryan Merriman’s email, so should I continue this conversation
>>with him outside of this dev list, or should I continue providing updates
>>on this email thread?
>>
>>Sincerely,
>>
>>Deeptaanshu Kumar
>>EDS ­ ISRM 
>>Data Engineer
>>Deeptaanshu.Kumar@CapitalOne.com
>>
>>
>>
>>
>>
>>On 5/17/16, 11:42 AM, "Casey Stella" <ce...@gmail.com> wrote:
>>
>>>Well, the problem is that those different kafka records that make up the
>>>full AD line may end up on different workers (imagine a situation where
>>>line 1 is on partition 1 and line 2 is on partition 2 and different
>>>storm
>>>spout workers handle those partitions).  I'd recommend joining the AD
>>>records prior to putting into kafka.
>>>
>>>On Tue, May 17, 2016 at 11:40 AM, Kumar, Deeptaanshu <
>>>Deeptaanshu.Kumar@capitalone.com> wrote:
>>>
>>>> Hi Metron Team,
>>>>
>>>> The Active Directory records span multiple Kafka records. The Active
>>>> Directory logs come in multi-line format directly from the servers. If
>>>>I
>>>> remove the newlines from the test data, and alter the parser to pass
>>>>the
>>>> integration tests, the parser will fail when it tries to parse actual
>>>> Active Directory logs. I think we may need to slightly alter the
>>>>Metron
>>>> code that handles the integration tests to deal with multi-line
>>>>records.
>>>> Please let me know how you want me to handle this issue.
>>>>
>>>> Sincerely,
>>>>
>>>> Deeptaanshu Kumar
>>>> EDS ­ ISRM
>>>> Data Engineer
>>>> Deeptaanshu.Kumar@CapitalOne.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 5/17/16, 11:23 AM, "Casey Stella" <ce...@gmail.com> wrote:
>>>>
>>>> >Is a record spanning multiple kafka records (one record per line) or
>>>>is it
>>>> >just that your test data is multi-line?  If it's the former, then I
>>>>think
>>>> >you may have a problem.  If it's just the later, could you just
>>>>remove
>>>>the
>>>> >newlines from your test data?
>>>> >
>>>> >On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
>>>> >Deeptaanshu.Kumar@capitalone.com> wrote:
>>>> >
>>>> >> Hi Metron Team,
>>>> >>
>>>> >> I am working on the Active Directory parser, and I have a question
>>>>about
>>>> >> the integration tests. Active Directory logs are multi-line logs,
>>>>and
>>>> >> currently, the Metron integration tests are configured to handle
>>>> >> single-line logs so the integration tests fail for Active
>>>>Directory.
>>>>How
>>>> >> would you recommend that I proceed with the integration tests for
>>>>Active
>>>> >> Directory logs? Should I modify code in the
>>>>ParserIntegrationTest.java
>>>> >>file
>>>> >> to accommodate for multi-line logs?
>>>> >>
>>>> >> Sincerely,
>>>> >>
>>>> >> *Deeptaanshu Kumar*
>>>> >> *EDS ­ ISRM *
>>>> >> *Data Engineer*
>>>> >> Deeptaanshu.Kumar@CapitalOne.com
>>>> >>
>>>> >> ------------------------------
>>>> >>
>>>> >> The information contained in this e-mail is confidential and/or
>>>> >> proprietary to Capital One and/or its affiliates and may only be
>>>>used
>>>> >> solely in performance of work or services for Capital One. The
>>>> >>information
>>>> >> transmitted herewith is intended only for use by the individual or
>>>> >>entity
>>>> >> to which it is addressed. If the reader of this message is not the
>>>> >>intended
>>>> >> recipient, you are hereby notified that any review, retransmission,
>>>> >> dissemination, distribution, copying or other use of, or taking of
>>>>any
>>>> >> action in reliance upon this information is strictly prohibited. If
>>>>you
>>>> >> have received this communication in error, please contact the
>>>>sender
>>>>and
>>>> >> delete the material from your computer.
>>>> >>
>>>>
>>>> ________________________________________________________
>>>>
>>>> The information contained in this e-mail is confidential and/or
>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>> solely in performance of work or services for Capital One. The
>>>>information
>>>> transmitted herewith is intended only for use by the individual or
>>>>entity
>>>> to which it is addressed. If the reader of this message is not the
>>>>intended
>>>> recipient, you are hereby notified that any review, retransmission,
>>>> dissemination, distribution, copying or other use of, or taking of any
>>>> action in reliance upon this information is strictly prohibited. If
>>>>you
>>>> have received this communication in error, please contact the sender
>>>>and
>>>> delete the material from your computer.
>>>>
>>>>
>>
>>________________________________________________________
>>
>>The information contained in this e-mail is confidential and/or
>>proprietary to Capital One and/or its affiliates and may only be used
>>solely in performance of work or services for Capital One. The
>>information transmitted herewith is intended only for use by the
>>individual or entity to which it is addressed. If the reader of this
>>message is not the intended recipient, you are hereby notified that any
>>review, retransmission, dissemination, distribution, copying or other use
>>of, or taking of any action in reliance upon this information is strictly
>>prohibited. If you have received this communication in error, please
>>contact the sender and delete the material from your computer.
>

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: [METRON-161] AD Integration Test Question

Posted by Ryan Merriman <rm...@hortonworks.com>.
Here is my suggestion:

1. Create a method in ParserIntegrationTest:  public List<byte[]>
readSampleData()

2. Move TestUtils.readSampleData into that method

3. Override that method in the AD integration test to return the proper AD
sample data

On 5/17/16, 9:36 AM, "Kumar, Deeptaanshu"
<De...@capitalone.com> wrote:

>Hi Metron Team,
>
>I misspoke earlier when I said the AD logs span multiple Kafka records. I
>meant to say that the way the Metron integration tests are currently
>setup, each line in the AD log is being treated as a separate Kafka
>record. I took a look at the code again and the readSampleData() method in
>TestUtils.java is reading each line in the AD log as a separate log. From
>here, the writeMessages() method in KafkaWithZKComponent.java is writing
>each line of the AD log to a different Kafka producer. If we could add
>code in either of these classes to handle multi-line logs, we would be
>able to fix this issue.
>
>I can join the AD records into a single line in my test logs, however, I
>will need to change the AD parser to handle one-line AD logs. Once I do
>that, the parser will pass the integration tests but will fail in
>production where the logs will be multi-line, not single-line. Jonathon
>Striley is correct, Nifi is configured to pass the entire multi-line AD
>log as one record to Kafka, which is why this parser is currently working
>in production. 
>
>I just saw Ryan Merriman’s email, so should I continue this conversation
>with him outside of this dev list, or should I continue providing updates
>on this email thread?
>
>Sincerely,
>
>Deeptaanshu Kumar
>EDS ­ ISRM 
>Data Engineer
>Deeptaanshu.Kumar@CapitalOne.com
>
>
>
>
>
>On 5/17/16, 11:42 AM, "Casey Stella" <ce...@gmail.com> wrote:
>
>>Well, the problem is that those different kafka records that make up the
>>full AD line may end up on different workers (imagine a situation where
>>line 1 is on partition 1 and line 2 is on partition 2 and different storm
>>spout workers handle those partitions).  I'd recommend joining the AD
>>records prior to putting into kafka.
>>
>>On Tue, May 17, 2016 at 11:40 AM, Kumar, Deeptaanshu <
>>Deeptaanshu.Kumar@capitalone.com> wrote:
>>
>>> Hi Metron Team,
>>>
>>> The Active Directory records span multiple Kafka records. The Active
>>> Directory logs come in multi-line format directly from the servers. If
>>>I
>>> remove the newlines from the test data, and alter the parser to pass
>>>the
>>> integration tests, the parser will fail when it tries to parse actual
>>> Active Directory logs. I think we may need to slightly alter the Metron
>>> code that handles the integration tests to deal with multi-line
>>>records.
>>> Please let me know how you want me to handle this issue.
>>>
>>> Sincerely,
>>>
>>> Deeptaanshu Kumar
>>> EDS ­ ISRM
>>> Data Engineer
>>> Deeptaanshu.Kumar@CapitalOne.com
>>>
>>>
>>>
>>>
>>>
>>> On 5/17/16, 11:23 AM, "Casey Stella" <ce...@gmail.com> wrote:
>>>
>>> >Is a record spanning multiple kafka records (one record per line) or
>>>is it
>>> >just that your test data is multi-line?  If it's the former, then I
>>>think
>>> >you may have a problem.  If it's just the later, could you just remove
>>>the
>>> >newlines from your test data?
>>> >
>>> >On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
>>> >Deeptaanshu.Kumar@capitalone.com> wrote:
>>> >
>>> >> Hi Metron Team,
>>> >>
>>> >> I am working on the Active Directory parser, and I have a question
>>>about
>>> >> the integration tests. Active Directory logs are multi-line logs,
>>>and
>>> >> currently, the Metron integration tests are configured to handle
>>> >> single-line logs so the integration tests fail for Active Directory.
>>>How
>>> >> would you recommend that I proceed with the integration tests for
>>>Active
>>> >> Directory logs? Should I modify code in the
>>>ParserIntegrationTest.java
>>> >>file
>>> >> to accommodate for multi-line logs?
>>> >>
>>> >> Sincerely,
>>> >>
>>> >> *Deeptaanshu Kumar*
>>> >> *EDS ­ ISRM *
>>> >> *Data Engineer*
>>> >> Deeptaanshu.Kumar@CapitalOne.com
>>> >>
>>> >> ------------------------------
>>> >>
>>> >> The information contained in this e-mail is confidential and/or
>>> >> proprietary to Capital One and/or its affiliates and may only be
>>>used
>>> >> solely in performance of work or services for Capital One. The
>>> >>information
>>> >> transmitted herewith is intended only for use by the individual or
>>> >>entity
>>> >> to which it is addressed. If the reader of this message is not the
>>> >>intended
>>> >> recipient, you are hereby notified that any review, retransmission,
>>> >> dissemination, distribution, copying or other use of, or taking of
>>>any
>>> >> action in reliance upon this information is strictly prohibited. If
>>>you
>>> >> have received this communication in error, please contact the sender
>>>and
>>> >> delete the material from your computer.
>>> >>
>>>
>>> ________________________________________________________
>>>
>>> The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The
>>>information
>>> transmitted herewith is intended only for use by the individual or
>>>entity
>>> to which it is addressed. If the reader of this message is not the
>>>intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of any
>>> action in reliance upon this information is strictly prohibited. If you
>>> have received this communication in error, please contact the sender
>>>and
>>> delete the material from your computer.
>>>
>>>
>
>________________________________________________________
>
>The information contained in this e-mail is confidential and/or
>proprietary to Capital One and/or its affiliates and may only be used
>solely in performance of work or services for Capital One. The
>information transmitted herewith is intended only for use by the
>individual or entity to which it is addressed. If the reader of this
>message is not the intended recipient, you are hereby notified that any
>review, retransmission, dissemination, distribution, copying or other use
>of, or taking of any action in reliance upon this information is strictly
>prohibited. If you have received this communication in error, please
>contact the sender and delete the material from your computer.


Re: [METRON-161] AD Integration Test Question

Posted by Casey Stella <ce...@gmail.com>.
So I definitely agree that we should have the integration tests handle
multi line inputs and binary inputs for that matter. How we did this for
pcap is to use sequence files as the storage format, but there are many
options.

On Tue, May 17, 2016 at 12:36 Kumar, Deeptaanshu <
Deeptaanshu.Kumar@capitalone.com> wrote:

> Hi Metron Team,
>
> I misspoke earlier when I said the AD logs span multiple Kafka records. I
> meant to say that the way the Metron integration tests are currently
> setup, each line in the AD log is being treated as a separate Kafka
> record. I took a look at the code again and the readSampleData() method in
> TestUtils.java is reading each line in the AD log as a separate log. From
> here, the writeMessages() method in KafkaWithZKComponent.java is writing
> each line of the AD log to a different Kafka producer. If we could add
> code in either of these classes to handle multi-line logs, we would be
> able to fix this issue.
>
> I can join the AD records into a single line in my test logs, however, I
> will need to change the AD parser to handle one-line AD logs. Once I do
> that, the parser will pass the integration tests but will fail in
> production where the logs will be multi-line, not single-line. Jonathon
> Striley is correct, Nifi is configured to pass the entire multi-line AD
> log as one record to Kafka, which is why this parser is currently working
> in production.
>
> I just saw Ryan Merriman’s email, so should I continue this conversation
> with him outside of this dev list, or should I continue providing updates
> on this email thread?
>
> Sincerely,
>
> Deeptaanshu Kumar
> EDS ­ ISRM
> Data Engineer
> Deeptaanshu.Kumar@CapitalOne.com
>
>
>
>
>
> On 5/17/16, 11:42 AM, "Casey Stella" <ce...@gmail.com> wrote:
>
> >Well, the problem is that those different kafka records that make up the
> >full AD line may end up on different workers (imagine a situation where
> >line 1 is on partition 1 and line 2 is on partition 2 and different storm
> >spout workers handle those partitions).  I'd recommend joining the AD
> >records prior to putting into kafka.
> >
> >On Tue, May 17, 2016 at 11:40 AM, Kumar, Deeptaanshu <
> >Deeptaanshu.Kumar@capitalone.com> wrote:
> >
> >> Hi Metron Team,
> >>
> >> The Active Directory records span multiple Kafka records. The Active
> >> Directory logs come in multi-line format directly from the servers. If I
> >> remove the newlines from the test data, and alter the parser to pass the
> >> integration tests, the parser will fail when it tries to parse actual
> >> Active Directory logs. I think we may need to slightly alter the Metron
> >> code that handles the integration tests to deal with multi-line records.
> >> Please let me know how you want me to handle this issue.
> >>
> >> Sincerely,
> >>
> >> Deeptaanshu Kumar
> >> EDS ­ ISRM
> >> Data Engineer
> >> Deeptaanshu.Kumar@CapitalOne.com
> >>
> >>
> >>
> >>
> >>
> >> On 5/17/16, 11:23 AM, "Casey Stella" <ce...@gmail.com> wrote:
> >>
> >> >Is a record spanning multiple kafka records (one record per line) or
> >>is it
> >> >just that your test data is multi-line?  If it's the former, then I
> >>think
> >> >you may have a problem.  If it's just the later, could you just remove
> >>the
> >> >newlines from your test data?
> >> >
> >> >On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
> >> >Deeptaanshu.Kumar@capitalone.com> wrote:
> >> >
> >> >> Hi Metron Team,
> >> >>
> >> >> I am working on the Active Directory parser, and I have a question
> >>about
> >> >> the integration tests. Active Directory logs are multi-line logs, and
> >> >> currently, the Metron integration tests are configured to handle
> >> >> single-line logs so the integration tests fail for Active Directory.
> >>How
> >> >> would you recommend that I proceed with the integration tests for
> >>Active
> >> >> Directory logs? Should I modify code in the
> >>ParserIntegrationTest.java
> >> >>file
> >> >> to accommodate for multi-line logs?
> >> >>
> >> >> Sincerely,
> >> >>
> >> >> *Deeptaanshu Kumar*
> >> >> *EDS ­ ISRM *
> >> >> *Data Engineer*
> >> >> Deeptaanshu.Kumar@CapitalOne.com
> >> >>
> >> >> ------------------------------
> >> >>
> >> >> The information contained in this e-mail is confidential and/or
> >> >> proprietary to Capital One and/or its affiliates and may only be used
> >> >> solely in performance of work or services for Capital One. The
> >> >>information
> >> >> transmitted herewith is intended only for use by the individual or
> >> >>entity
> >> >> to which it is addressed. If the reader of this message is not the
> >> >>intended
> >> >> recipient, you are hereby notified that any review, retransmission,
> >> >> dissemination, distribution, copying or other use of, or taking of
> >>any
> >> >> action in reliance upon this information is strictly prohibited. If
> >>you
> >> >> have received this communication in error, please contact the sender
> >>and
> >> >> delete the material from your computer.
> >> >>
> >>
> >> ________________________________________________________
> >>
> >> The information contained in this e-mail is confidential and/or
> >> proprietary to Capital One and/or its affiliates and may only be used
> >> solely in performance of work or services for Capital One. The
> >>information
> >> transmitted herewith is intended only for use by the individual or
> >>entity
> >> to which it is addressed. If the reader of this message is not the
> >>intended
> >> recipient, you are hereby notified that any review, retransmission,
> >> dissemination, distribution, copying or other use of, or taking of any
> >> action in reliance upon this information is strictly prohibited. If you
> >> have received this communication in error, please contact the sender and
> >> delete the material from your computer.
> >>
> >>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: [METRON-161] AD Integration Test Question

Posted by "Kumar, Deeptaanshu" <De...@capitalone.com>.
Hi Metron Team,

I misspoke earlier when I said the AD logs span multiple Kafka records. I
meant to say that the way the Metron integration tests are currently
setup, each line in the AD log is being treated as a separate Kafka
record. I took a look at the code again and the readSampleData() method in
TestUtils.java is reading each line in the AD log as a separate log. From
here, the writeMessages() method in KafkaWithZKComponent.java is writing
each line of the AD log to a different Kafka producer. If we could add
code in either of these classes to handle multi-line logs, we would be
able to fix this issue.

I can join the AD records into a single line in my test logs, however, I
will need to change the AD parser to handle one-line AD logs. Once I do
that, the parser will pass the integration tests but will fail in
production where the logs will be multi-line, not single-line. Jonathon
Striley is correct, Nifi is configured to pass the entire multi-line AD
log as one record to Kafka, which is why this parser is currently working
in production. 

I just saw Ryan Merriman’s email, so should I continue this conversation
with him outside of this dev list, or should I continue providing updates
on this email thread?

Sincerely,

Deeptaanshu Kumar
EDS ­ ISRM 
Data Engineer
Deeptaanshu.Kumar@CapitalOne.com





On 5/17/16, 11:42 AM, "Casey Stella" <ce...@gmail.com> wrote:

>Well, the problem is that those different kafka records that make up the
>full AD line may end up on different workers (imagine a situation where
>line 1 is on partition 1 and line 2 is on partition 2 and different storm
>spout workers handle those partitions).  I'd recommend joining the AD
>records prior to putting into kafka.
>
>On Tue, May 17, 2016 at 11:40 AM, Kumar, Deeptaanshu <
>Deeptaanshu.Kumar@capitalone.com> wrote:
>
>> Hi Metron Team,
>>
>> The Active Directory records span multiple Kafka records. The Active
>> Directory logs come in multi-line format directly from the servers. If I
>> remove the newlines from the test data, and alter the parser to pass the
>> integration tests, the parser will fail when it tries to parse actual
>> Active Directory logs. I think we may need to slightly alter the Metron
>> code that handles the integration tests to deal with multi-line records.
>> Please let me know how you want me to handle this issue.
>>
>> Sincerely,
>>
>> Deeptaanshu Kumar
>> EDS ­ ISRM
>> Data Engineer
>> Deeptaanshu.Kumar@CapitalOne.com
>>
>>
>>
>>
>>
>> On 5/17/16, 11:23 AM, "Casey Stella" <ce...@gmail.com> wrote:
>>
>> >Is a record spanning multiple kafka records (one record per line) or
>>is it
>> >just that your test data is multi-line?  If it's the former, then I
>>think
>> >you may have a problem.  If it's just the later, could you just remove
>>the
>> >newlines from your test data?
>> >
>> >On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
>> >Deeptaanshu.Kumar@capitalone.com> wrote:
>> >
>> >> Hi Metron Team,
>> >>
>> >> I am working on the Active Directory parser, and I have a question
>>about
>> >> the integration tests. Active Directory logs are multi-line logs, and
>> >> currently, the Metron integration tests are configured to handle
>> >> single-line logs so the integration tests fail for Active Directory.
>>How
>> >> would you recommend that I proceed with the integration tests for
>>Active
>> >> Directory logs? Should I modify code in the
>>ParserIntegrationTest.java
>> >>file
>> >> to accommodate for multi-line logs?
>> >>
>> >> Sincerely,
>> >>
>> >> *Deeptaanshu Kumar*
>> >> *EDS ­ ISRM *
>> >> *Data Engineer*
>> >> Deeptaanshu.Kumar@CapitalOne.com
>> >>
>> >> ------------------------------
>> >>
>> >> The information contained in this e-mail is confidential and/or
>> >> proprietary to Capital One and/or its affiliates and may only be used
>> >> solely in performance of work or services for Capital One. The
>> >>information
>> >> transmitted herewith is intended only for use by the individual or
>> >>entity
>> >> to which it is addressed. If the reader of this message is not the
>> >>intended
>> >> recipient, you are hereby notified that any review, retransmission,
>> >> dissemination, distribution, copying or other use of, or taking of
>>any
>> >> action in reliance upon this information is strictly prohibited. If
>>you
>> >> have received this communication in error, please contact the sender
>>and
>> >> delete the material from your computer.
>> >>
>>
>> ________________________________________________________
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The
>>information
>> transmitted herewith is intended only for use by the individual or
>>entity
>> to which it is addressed. If the reader of this message is not the
>>intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: [METRON-161] AD Integration Test Question

Posted by Ryan Merriman <rm...@hortonworks.com>.
Why does an Active Directory record have to span multiple Kafka messages?
What Kafka producer are you using?  A Kafka message is a byte array and
doesn’t care about new lines.  Casey is correct, the AD record has to be
contained in a single Kafka message or it becomes much harder and is not
something that is supported right now.  As far the integration test,
ParserIntegrationTest is an abstract class so abstracting out the function
of reading in sample records should be straightforward.  I’m happy to work
with you on that.

Ryan Merriman

On 5/17/16, 8:42 AM, "Casey Stella" <ce...@gmail.com> wrote:

>Well, the problem is that those different kafka records that make up the
>full AD line may end up on different workers (imagine a situation where
>line 1 is on partition 1 and line 2 is on partition 2 and different storm
>spout workers handle those partitions).  I'd recommend joining the AD
>records prior to putting into kafka.
>
>On Tue, May 17, 2016 at 11:40 AM, Kumar, Deeptaanshu <
>Deeptaanshu.Kumar@capitalone.com> wrote:
>
>> Hi Metron Team,
>>
>> The Active Directory records span multiple Kafka records. The Active
>> Directory logs come in multi-line format directly from the servers. If I
>> remove the newlines from the test data, and alter the parser to pass the
>> integration tests, the parser will fail when it tries to parse actual
>> Active Directory logs. I think we may need to slightly alter the Metron
>> code that handles the integration tests to deal with multi-line records.
>> Please let me know how you want me to handle this issue.
>>
>> Sincerely,
>>
>> Deeptaanshu Kumar
>> EDS ­ ISRM
>> Data Engineer
>> Deeptaanshu.Kumar@CapitalOne.com
>>
>>
>>
>>
>>
>> On 5/17/16, 11:23 AM, "Casey Stella" <ce...@gmail.com> wrote:
>>
>> >Is a record spanning multiple kafka records (one record per line) or
>>is it
>> >just that your test data is multi-line?  If it's the former, then I
>>think
>> >you may have a problem.  If it's just the later, could you just remove
>>the
>> >newlines from your test data?
>> >
>> >On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
>> >Deeptaanshu.Kumar@capitalone.com> wrote:
>> >
>> >> Hi Metron Team,
>> >>
>> >> I am working on the Active Directory parser, and I have a question
>>about
>> >> the integration tests. Active Directory logs are multi-line logs, and
>> >> currently, the Metron integration tests are configured to handle
>> >> single-line logs so the integration tests fail for Active Directory.
>>How
>> >> would you recommend that I proceed with the integration tests for
>>Active
>> >> Directory logs? Should I modify code in the
>>ParserIntegrationTest.java
>> >>file
>> >> to accommodate for multi-line logs?
>> >>
>> >> Sincerely,
>> >>
>> >> *Deeptaanshu Kumar*
>> >> *EDS ­ ISRM *
>> >> *Data Engineer*
>> >> Deeptaanshu.Kumar@CapitalOne.com
>> >>
>> >> ------------------------------
>> >>
>> >> The information contained in this e-mail is confidential and/or
>> >> proprietary to Capital One and/or its affiliates and may only be used
>> >> solely in performance of work or services for Capital One. The
>> >>information
>> >> transmitted herewith is intended only for use by the individual or
>> >>entity
>> >> to which it is addressed. If the reader of this message is not the
>> >>intended
>> >> recipient, you are hereby notified that any review, retransmission,
>> >> dissemination, distribution, copying or other use of, or taking of
>>any
>> >> action in reliance upon this information is strictly prohibited. If
>>you
>> >> have received this communication in error, please contact the sender
>>and
>> >> delete the material from your computer.
>> >>
>>
>> ________________________________________________________
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The
>>information
>> transmitted herewith is intended only for use by the individual or
>>entity
>> to which it is addressed. If the reader of this message is not the
>>intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>


Re: [METRON-161] AD Integration Test Question

Posted by Casey Stella <ce...@gmail.com>.
Well, the problem is that those different kafka records that make up the
full AD line may end up on different workers (imagine a situation where
line 1 is on partition 1 and line 2 is on partition 2 and different storm
spout workers handle those partitions).  I'd recommend joining the AD
records prior to putting into kafka.

On Tue, May 17, 2016 at 11:40 AM, Kumar, Deeptaanshu <
Deeptaanshu.Kumar@capitalone.com> wrote:

> Hi Metron Team,
>
> The Active Directory records span multiple Kafka records. The Active
> Directory logs come in multi-line format directly from the servers. If I
> remove the newlines from the test data, and alter the parser to pass the
> integration tests, the parser will fail when it tries to parse actual
> Active Directory logs. I think we may need to slightly alter the Metron
> code that handles the integration tests to deal with multi-line records.
> Please let me know how you want me to handle this issue.
>
> Sincerely,
>
> Deeptaanshu Kumar
> EDS ­ ISRM
> Data Engineer
> Deeptaanshu.Kumar@CapitalOne.com
>
>
>
>
>
> On 5/17/16, 11:23 AM, "Casey Stella" <ce...@gmail.com> wrote:
>
> >Is a record spanning multiple kafka records (one record per line) or is it
> >just that your test data is multi-line?  If it's the former, then I think
> >you may have a problem.  If it's just the later, could you just remove the
> >newlines from your test data?
> >
> >On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
> >Deeptaanshu.Kumar@capitalone.com> wrote:
> >
> >> Hi Metron Team,
> >>
> >> I am working on the Active Directory parser, and I have a question about
> >> the integration tests. Active Directory logs are multi-line logs, and
> >> currently, the Metron integration tests are configured to handle
> >> single-line logs so the integration tests fail for Active Directory. How
> >> would you recommend that I proceed with the integration tests for Active
> >> Directory logs? Should I modify code in the ParserIntegrationTest.java
> >>file
> >> to accommodate for multi-line logs?
> >>
> >> Sincerely,
> >>
> >> *Deeptaanshu Kumar*
> >> *EDS ­ ISRM *
> >> *Data Engineer*
> >> Deeptaanshu.Kumar@CapitalOne.com
> >>
> >> ------------------------------
> >>
> >> The information contained in this e-mail is confidential and/or
> >> proprietary to Capital One and/or its affiliates and may only be used
> >> solely in performance of work or services for Capital One. The
> >>information
> >> transmitted herewith is intended only for use by the individual or
> >>entity
> >> to which it is addressed. If the reader of this message is not the
> >>intended
> >> recipient, you are hereby notified that any review, retransmission,
> >> dissemination, distribution, copying or other use of, or taking of any
> >> action in reliance upon this information is strictly prohibited. If you
> >> have received this communication in error, please contact the sender and
> >> delete the material from your computer.
> >>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>

Re: [METRON-161] AD Integration Test Question

Posted by "Kumar, Deeptaanshu" <De...@capitalone.com>.
Hi Metron Team,

The Active Directory records span multiple Kafka records. The Active
Directory logs come in multi-line format directly from the servers. If I
remove the newlines from the test data, and alter the parser to pass the
integration tests, the parser will fail when it tries to parse actual
Active Directory logs. I think we may need to slightly alter the Metron
code that handles the integration tests to deal with multi-line records.
Please let me know how you want me to handle this issue.

Sincerely,

Deeptaanshu Kumar
EDS ­ ISRM 
Data Engineer
Deeptaanshu.Kumar@CapitalOne.com





On 5/17/16, 11:23 AM, "Casey Stella" <ce...@gmail.com> wrote:

>Is a record spanning multiple kafka records (one record per line) or is it
>just that your test data is multi-line?  If it's the former, then I think
>you may have a problem.  If it's just the later, could you just remove the
>newlines from your test data?
>
>On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
>Deeptaanshu.Kumar@capitalone.com> wrote:
>
>> Hi Metron Team,
>>
>> I am working on the Active Directory parser, and I have a question about
>> the integration tests. Active Directory logs are multi-line logs, and
>> currently, the Metron integration tests are configured to handle
>> single-line logs so the integration tests fail for Active Directory. How
>> would you recommend that I proceed with the integration tests for Active
>> Directory logs? Should I modify code in the ParserIntegrationTest.java
>>file
>> to accommodate for multi-line logs?
>>
>> Sincerely,
>>
>> *Deeptaanshu Kumar*
>> *EDS ­ ISRM *
>> *Data Engineer*
>> Deeptaanshu.Kumar@CapitalOne.com
>>
>> ------------------------------
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The
>>information
>> transmitted herewith is intended only for use by the individual or
>>entity
>> to which it is addressed. If the reader of this message is not the
>>intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


Re: [METRON-161] AD Integration Test Question

Posted by Casey Stella <ce...@gmail.com>.
Is a record spanning multiple kafka records (one record per line) or is it
just that your test data is multi-line?  If it's the former, then I think
you may have a problem.  If it's just the later, could you just remove the
newlines from your test data?

On Tue, May 17, 2016 at 11:14 AM, Kumar, Deeptaanshu <
Deeptaanshu.Kumar@capitalone.com> wrote:

> Hi Metron Team,
>
> I am working on the Active Directory parser, and I have a question about
> the integration tests. Active Directory logs are multi-line logs, and
> currently, the Metron integration tests are configured to handle
> single-line logs so the integration tests fail for Active Directory. How
> would you recommend that I proceed with the integration tests for Active
> Directory logs? Should I modify code in the ParserIntegrationTest.java file
> to accommodate for multi-line logs?
>
> Sincerely,
>
> *Deeptaanshu Kumar*
> *EDS – ISRM *
> *Data Engineer*
> Deeptaanshu.Kumar@CapitalOne.com
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>