You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Munagala Ramanath <ra...@datatorrent.com> on 2016/06/06 02:24:25 UTC

Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
wrote:

> I'm hoping to have a sample sometime next week.
>
> Ram
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
>> Thank you so much ram, for your advice , Option (a) would be ideal for my
>> requirement.
>>
>>
>>
>> Do you have sample usage for partitioning with individual configuration
>> set ups different partitions?
>>
>>
>>
>> Regards,
>>
>> Surya Vamshi
>>
>>
>>
>> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
>> *Sent:* 2016, May, 25 12:11 PM
>> *To:* users@apex.apache.org
>> *Subject:* Re: Multiple directories
>>
>>
>>
>> You have 2 options: (a) AbstractFileInputOperator (b)
>> FileSplitter/BlockReader
>>
>>
>>
>> For (a), each partition (i.e. replica or the operator) can scan only a
>> single directory, so if you have 100
>>
>> directories, you can simply start with 100 partitions; since each
>> partition is scanning its own directory
>>
>> you don't need to worry about which files the lines came from. This
>> approach however needs a custom
>>
>> definePartition() implementation in your subclass to assign the
>> appropriate directory and XML parsing
>>
>> config file to each partition; it also needs adequate cluster resources
>> to be able to spin up the required
>>
>> number of partitions.
>>
>>
>>
>> For (b), there is some documentation in the Operators section at
>> http://docs.datatorrent.com/ including
>>
>> sample code. There operators support scanning multiple directories out of
>> the box but have more
>>
>> elaborate configuration options. Check this out and see if it works in
>> your use case.
>>
>>
>>
>> Ram
>>
>>
>>
>> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
>> suryavamshivardhan.mukkamula@rbc.com> wrote:
>>
>> Hello Ram/Team,
>>
>>
>>
>> My requirement is to read input feeds from different locations on HDFS
>> and parse those files by reading XML configuration files (each input feed
>> has configuration file which defines the fields inside the input feeds).
>>
>>
>>
>> My approach : I would like to define a mapping file which contains
>> individual feed identifier, feed location , configuration file location. I
>> would like to read this mapping file at initial load within setup() method
>> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
>> the files , I should parse the lines by reading the individual
>> configuration files. How do I know the line is from particular file , if I
>> know this I can read the corresponding configuration file before parsing
>> the line.
>>
>>
>>
>> Please let me know how do I handle this.
>>
>>
>>
>> Regards,
>>
>> Surya Vamshi
>>
>>
>>
>> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
>> *Sent:* 2016, May, 24 5:49 PM
>> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
>> *Subject:* Multiple directories
>>
>>
>>
>> One way of addressing the issue is to use some sort of external tool
>> (like a script) to
>>
>> copy all the input files to a common directory (making sure that the file
>> names are
>>
>> unique to prevent one file from overwriting another) before the Apex
>> application starts.
>>
>>
>>
>> The Apex application then starts and processes files from this directory.
>>
>>
>>
>> If you set the partition count of the file input operator to N, it will
>> create N partitions and
>>
>> the files will be automatically distributed among the partitions. The
>> partitions will work
>>
>> in parallel.
>>
>>
>>
>> Ram
>>
>> _______________________________________________________________________
>>
>> This [email] may be privileged and/or confidential, and the sender does
>> not waive any related rights and obligations. Any distribution, use or
>> copying of this [email] or the information it contains by other than an
>> intended recipient is unauthorized. If you received this [email] in error,
>> please advise the sender (by return [email] or otherwise) immediately. You
>> have consented to receive the attached electronically at the above-noted
>> address; please retain a copy of this confirmation for future reference.
>>
>>
>>
>> _______________________________________________________________________
>>
>> This [email] may be privileged and/or confidential, and the sender does
>> not waive any related rights and obligations. Any distribution, use or
>> copying of this [email] or the information it contains by other than an
>> intended recipient is unauthorized. If you received this [email] in error,
>> please advise the sender (by return [email] or otherwise) immediately. You
>> have consented to receive the attached electronically at the above-noted
>> address; please retain a copy of this confirmation for future reference.
>>
>>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram,

Assuming that properties are set as Key,Value pairs. I have used the properties as below and I can read the multiple directories in parallel. Thank you.

<!-- Source : source_123 properties -->
   <property>
    <name>dt.application.FileIO.operator.read.prop.inputDirectory(source_123)</name>
    <value>tmp/fileIO/source_123</value>
  </property>
  <property>
    <name>dt.application.FileIO.operator.read.prop.inputConfigFile(source_123)</name>
    <value>tmp/fileIO/config/source_123/source_123_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.FileIO.operator.read.prop.partCount(source_123)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.FileIO.operator.read.prop.outputDirectory(source_123)</name>
    <value>tmp/fileIO/source_123</value>
  </property>
  <property>
    <name>dt.application.FileIO.operator.read.prop.outputConfigFile(source_123)</name>
    <value>tmp/fileIO/config/source_123/source_123_output_config.xml</value>
  </property>

Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 08 5:14 PM
To: users@apex.apache.org
Subject: RE: Multiple directories

Hi Ram,

Thank you.

I would like to define the below class elements as list from the properties.xml , I tried creating as below but no luck. Can you please help on how to correctly define the list of below elements.

public class InputValues<SOURCEID,DIRECTORY,CONFIGFILE,PARTITIONCOUNT> {

           public SOURCEID sourceId;
           public DIRECTORY directory;
           public CONFIGFILE configFile;
           public PARTITIONCOUNT partitionCount;
           public InputValues() {
           }

           public InputValues(SOURCEID sourceId, DIRECTORY directory,CONFIGFILE configFile,PARTITIONCOUNT partitionCount) {
               this.sourceId = sourceId;
               this.directory = directory;
               this.configFile = configFile;
               this.partitionCount = partitionCount;
           }


}

Properties:

<property>
    <name>dt.application.FileIO.operator.read.prop.inputValues(source_123)</name>
    <value>tmp/fileIO/source_123</value>
    <value>tmp/fileIO/config/source_123_config.xml</value>
    <value>1</value>
  </property>

  <property>
    <name>dt.application.FileIO.operator.read.prop.inputValues(source_124)</name>
    <value>tmp/fileIO/source_124</value>
    <value>tmp/fileIO/config/source_124_config.xml</value>
    <value>1</value>
  </property>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com]
Sent: 2016, June, 05 10:24 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.
_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram,

Thank you.

I would like to define the below class elements as list from the properties.xml , I tried creating as below but no luck. Can you please help on how to correctly define the list of below elements.

public class InputValues<SOURCEID,DIRECTORY,CONFIGFILE,PARTITIONCOUNT> {

           public SOURCEID sourceId;
           public DIRECTORY directory;
           public CONFIGFILE configFile;
           public PARTITIONCOUNT partitionCount;
           public InputValues() {
           }

           public InputValues(SOURCEID sourceId, DIRECTORY directory,CONFIGFILE configFile,PARTITIONCOUNT partitionCount) {
               this.sourceId = sourceId;
               this.directory = directory;
               this.configFile = configFile;
               this.partitionCount = partitionCount;
           }


}

Properties:

<property>
    <name>dt.application.FileIO.operator.read.prop.inputValues(source_123)</name>
    <value>tmp/fileIO/source_123</value>
    <value>tmp/fileIO/config/source_123_config.xml</value>
    <value>1</value>
  </property>

  <property>
    <name>dt.application.FileIO.operator.read.prop.inputValues(source_124)</name>
    <value>tmp/fileIO/source_124</value>
    <value>tmp/fileIO/config/source_124_config.xml</value>
    <value>1</value>
  </property>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com]
Sent: 2016, June, 05 10:24 PM
To: users@apex.apache.org
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==> We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==> Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by Thomas Weise <th...@gmail.com>.
I see some streams are configured as CONTAINER_LOCAL. Are these connected
to the operator in question? The total memory for a container includes all
operators.

Can you post your populateDAG code?



On Mon, Jun 20, 2016 at 7:04 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi Thomas,
>
>
>
> Though I requested for 500MB, Master Log shows that it is requesting for
> 4GB container. My properties are being ignored. Please suggest.
>
>
>
> 2016-06-17 15:43:19,174 INFO  security.StramWSFilter
> (StramWSFilter.java:doFilter(157)) - 10.60.39.21: proxy access to URI
> /ws/v2/stram/info by user dr.who, no cookie created
>
> 2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp005.saifg.rbc.com:45454,numContainers=7,capability=<memory:245760,
> vCores:40>used=<memory:73728, vCores:11>state=RUNNING
>
> 2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp008.saifg.rbc.com:45454,numContainers=6,capability=<memory:245760,
> vCores:40>used=<memory:81920, vCores:11>state=RUNNING
>
> 2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager
> (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at
> operator write 130
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp006.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760,
> vCores:40>used=<memory:57344, vCores:8>state=RUNNING
>
> 2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager
> (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at
> operator read 130
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp007.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760,
> vCores:40>used=<memory:86016, vCores:10>state=RUNNING
>
> 2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager
> (StreamingContainerManager.java:getAppDataSources(620)) - DEBUG: looking at
> port output 130
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp004.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760,
> vCores:40>used=<memory:49152, vCores:6>state=RUNNING
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp012.saifg.rbc.com:45454,numContainers=7,capability=<memory:204800,
> vCores:39>used=<memory:65536, vCores:10>state=RUNNING
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp011.saifg.rbc.com:45454,numContainers=5,capability=<memory:204800,
> vCores:39>used=<memory:77824, vCores:9>state=RUNNING
>
> 2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp010.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760,
> vCores:40>used=<memory:36864, vCores:5>state=RUNNING
>
> 2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp009.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760,
> vCores:40>used=<memory:77824, vCores:9>state=RUNNING
>
> 2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1048)) - Asking RM for
> containers: [Capability[<memory:4096, vCores:1>]Priority[0],
> Capability[<memory:4096, vCores:1>]Priority[1], Capability[<memory:4096,
> vCores:1>]Priority[2], Capability[<memory:4096, vCores:1>]Priority[3]]
>
> 2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[0]
>
> 2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[1]
>
> 2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[2]
>
> 2016-06-17 15:43:20,186 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[3]
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp006.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp004.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp012.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp009.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,215 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:execute(822)) - Got new container.,
> containerId=container_e35_1465495186350_2632_01_000002, containerNode=
> guedlpdhdp006.saifg.rbc.com:45454, containerNodeURI=
> guedlpdhdp006.saifg.rbc.com:8042, containerResourceMemory4096, priority0
>
> 2016-06-17 15:43:21,226 INFO
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating
> password for identifier: owner=mukkamula, renewer=, realUser=,
> issueDate=1466192601226, maxDate=4611687484619989129, sequenceNumber=1,
> masterKeyId=2, currentKey: 2
>
> 2016-06-17 15:43:21,228 INFO  stram.LaunchContainerRunnable
> (LaunchContainerRunnable.java:run(143)) - Setting up container launch
> context for containerid=container_e35_1465495186350_2632_01_000002
>
> 2016-06-17 15:43:21,233 INFO  stram.LaunchContainerRunnable
> (LaunchContainerRunnable.java:setClasspath(118)) - CLASSPATH:
> ./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
>
> 2016-06-17 15:43:21,323 INFO  util.BasicContainerOptConfigurator
> (BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for
> operator {Generic=null, -Xmx=768m}
>
> 2016-06-17 15:43:21,323 INFO  stram.LaunchContainerRunnable
> (LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts
> -Xmx939524096  for container container_e35_1465495186350_2632_01_000002
>
> 2016
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com]
> *Sent:* 2016, June, 18 12:03 AM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Please check in the Apex application master log (container 1), how much
> memory it is requesting. If that's the correct figure and still you end up
> with a larger container, the problem could be the minimum container size in
> the YARN scheduler configuration.
>
>
>
>
>
> On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> I tried that option of adding the memory properties in site/conf and
> selected during the launch , but no luck. The same is working with my local
> sandbox set up.
>
>
>
> Is there any other way that I can understand the reason?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 17 3:06 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Please take a look at the section entitled "Properties source precedence"
> at
>
> http://docs.datatorrent.com/application_packages/
>
>
>
> It looks like the setting in dt-site.xml on the cluster is overriding your
> application defined values.
>
> If you add the properties to file under site/conf in your application and
> then
>
> select it during launch, those values should take effect.
>
>
>
> For signalling EOF, another option is to use a separate control port to
> send the EOF which could
>
> just be the string "EOF" for example.
>
>
>
>
>
> Ram
>
>
>
> On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I found a way for the 2nd question.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
>
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.(Solution: It seems readEntity() method if returns null
> emitTuple() method is not getting called, I have managed to emit the null
> object from readEntity() itself)
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> suryavamshivardhan.mukkamula@rbc.com]
> *Sent:* 2016, June, 17 12:20 PM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi,
>
>
>
> Can you please help me understand the below issues.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.
>
>
>
> ################### File Reader
> ####################################################
>
>
>
> @Override
>
>                 protected String readEntity() throws IOException {
>
>                                 // try to read a line
>
>                                 final String line = br.readLine();
>
>                                 if (null != line) { // normal case
>
>                                                 LOG.debug("readEntity:
> line = {}", line);
>
>                                                 return line;
>
>                                 }
>
>
>
>                                 // end-of-file (control tuple sent in
> closeFile()
>
>                                 LOG.info("readEntity: EOF for {}",
> filePath);
>
>                                 return null;
>
>                 }
>
>
>
>                 @Override
>
>                 protected void emit(String line) {
>
>                                 // parsing logic here, parse the line as
> per the input configuration and
>
>                                 // create the output line as per the
> output configuration
>
>                                 if(line == null){
>
>                                                 output.emit(new
> KeyValue<String,String>(getFileName(),null));
>
>                                 }
>
>                                 KeyValue<String, String> tuple = new
> KeyValue<String, String>();
>
>                                 tuple.key = getFileName();
>
>                                 tuple.value = line;
>
>                                 KeyValue<String, String> newTuple =
> parseTuple(tuple);
>
>                                 output.emit(newTuple);
>
>                 }
>
> ######################File
> Writer######################################################
>
> public class FileOutputOperator extends
> AbstractFileOutputOperator<KeyValue<String, String>> {
>
>     private static final Logger LOG =
> LoggerFactory.getLogger(FileOutputOperator.class);
>
>     private List<String> filesToFinalize = new ArrayList<>();
>
>
>
>     @Override
>
>     public void setup(Context.OperatorContext context) {
>
>         super.setup(context);
>
>         finalizeFiles();
>
>     }
>
>
>
>     @Override
>
>     protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
>
>         if (tuple.value == null) {
>
>                 LOG.info("File to finalize {}",tuple.key);
>
>             filesToFinalize.add(tuple.key);
>
>             return new byte[0];
>
>         }
>
>         else {
>
>             return tuple.value.getBytes();
>
>         }
>
>     }
>
>
>
>     @Override
>
>     protected String getFileName(KeyValue<String, String> tuple) {
>
>         return tuple.key;
>
>     }
>
>
>
>     @Override
>
>     public void endWindow() {
>
>         super.endWindow();
>
>         finalizeFiles();
>
>     }
>
>
>
>     private void finalizeFiles() {
>
>                 LOG.info("Files to finalize {}",filesToFinalize.toArray());
>
>         Iterator<String> fileIt = filesToFinalize.iterator();
>
>         while(fileIt.hasNext()) {
>
>             requestFinalize(fileIt.next());
>
>             fileIt.remove();
>
>         }
>
>     }
>
> }
>
>
> ##################################################################################################
>
>
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [
> mailto:suryavamshivardhan.mukkamula@rbc.com
> <su...@rbc.com>]
> *Sent:* 2016, June, 17 9:11 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi Ram/Raja,
>
>
>
> Hbase dependency was creating older Hadoop jars in my classpath. I removed
> the Hbase dependency which I don’t need for now and issue got resolved.
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Raja.Aravapalli [mailto:Raja.Aravapalli@target.com
> <Ra...@target.com>]
> *Sent:* 2016, June, 17 7:06 AM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> I also, faced similar problem with Hadoop jars, when I used HBase jars in
> pom.xml!! It could be because of version of hadoop jars your apa is holding
> different than the one’s in the cluster!!
>
>
>
>
>
> What I did to solve this is,
>
>
>
> Included the scope provided in Maven pom.xml for hbase jars, and then
> provided the hbase jars to application package during submission using
>  "-libjars” with launch command. Which solved my Invalid Container Id
> problem!!
>
>
>
> You can type “launch help” for learning on usage details.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *Munagala Ramanath <ra...@datatorrent.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Thursday, June 16, 2016 at 4:10 PM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Subject: *Re: Multiple directories
>
>
>
> Those 6 hadoop jars are definitely a problem.
>
>
>
> I didn't see the output of "*mvn dependency:tree*"; could you post that ?
>
> It will show you why these hadoop jars are being pulled in.
>
>
>
> Also, please refer to the section "Hadoop dependencies conflicts" in the
> troubleshooting guide:
>
> http://docs.datatorrent.com/troubleshooting/
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below are the details.
>
>
>
>
>
> 0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
>
>    358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
>
>      0 Wed Jun 15 16:34:24 EDT 2016 app/
>
> 52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
>
>      0 Wed Jun 15 16:34:22 EDT 2016 lib/
>
> 62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
>
> 1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
>
>   4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
>
> 44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
>
> 691479 Wed Jun 15 16:34:22 EDT 2016
> lib/apacheds-kerberos-codec-2.0.0-M15.jar
>
> 16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
>
> 79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
>
> 43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
>
> 303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
>
> 232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
>
> 41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
>
> 284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
>
> 575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
>
> 30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
>
> 241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
>
> 298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
>
> 143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
>
> 112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
>
> 305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
>
> 185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
>
> 284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
>
> 315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
>
> 61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
>
> 1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
>
> 273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
>
> 3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
>
> 313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
>
> 17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
>
> 15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
>
> 84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
>
> 20220 Wed Jun 15 16:34:22 EDT 2016
> lib/geronimo-j2ee-management_1.1_spec-1.0.1
>
> jar
>
> 32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
>
> 21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
>
> 690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
>
> 253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
>
> 198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
>
> 336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
>
>   8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
>
> 1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
>
> 710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
>
> 65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
>
> 16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
>
> 52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
>
> 2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
>
> 1498368 Wed Jun 15 16:34:22 EDT 2016
> lib/hadoop-mapreduce-client-core-2.5.1.jar
>
> 1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
>
> 1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
>
> 50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
>
> 20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
>
> 1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
>
> 530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
>
> 4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
>
> 1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
>
> 590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
>
> 282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
>
> 228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
>
> 765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
>
>   2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
>
> 521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
>
> 83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
>
> 85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
>
> 105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
>
> 890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
>
> 1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
>
> 151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
>
> 130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
>
> 458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
>
> 17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
>
> 14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
>
> 147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
>
> 713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
>
> 28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
>
> 12976 Wed Jun 15 16:34:22 EDT 2016
> lib/jersey-test-framework-grizzly2-1.9.jar
>
> 67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
>
> 21144 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-continuation-8.1.10.v20130312.jar
>
> 95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
>
> 103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
>
> 89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
>
> 347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
>
> 101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
>
> 177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
>
> 284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
>
> 125928 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-websocket-8.1.10.v20130312.jar
>
> 39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
>
> 187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
>
> 280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
>
> 33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
>
> 489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
>
> 565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
>
> 1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
>
> 42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
>
>   8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
>
> 1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
>
> 1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
>
> 29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
>
> 1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
>
> 936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
>
> 4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
>
> 533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
>
> 26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
>
>   9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
>
> 995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
>
> 23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
>
> 26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
>
>   2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
>
> 991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
>
> 758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
>
> 109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
>
> 2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
>
> 15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
>
> 94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
>
> 792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
>
>      0 Wed Jun 15 16:34:28 EDT 2016 conf/
>
>    334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
>
>   3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml
>
>
>
> <?xmlversion=*"1.0"*encoding=*"UTF-8"*?>
>
> <projectxmlns=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>"*xmlns:xsi=*"http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
>
>        xsi:schemaLocation=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>http://maven.apache.org/xsd/maven-4.0.0.xsd
> <http://maven.apache.org/xsd/maven-4.0.0.xsd>"*>
>
>        <modelVersion>4.0.0</modelVersion>
>
>        <groupId>com.rbc.aml.cnscan</groupId>
>
>        <version>1.0-SNAPSHOT</version>
>
>        <artifactId>*countrynamescan*</artifactId>
>
>        <packaging>jar</packaging>
>
>
>
>        <!-- change these to the appropriate values -->
>
>        <name>*countrynamescan*</name>
>
>        <description>Country and Name Scan project</description>
>
>
>
>        <properties>
>
>               <!-- change this if you desire to use a different version
> of DataTorrent -->
>
>               <datatorrent.version>3.1.1</datatorrent.version>
>
>               <datatorrent.apppackage.classpath>lib/*.jar</
> datatorrent.apppackage.classpath>
>
>        </properties>
>
>
>
>        <!-- repository to provide the DataTorrent artifacts -->
>
>        <!-- <repositories>
>
>               <repository>
>
>                      <snapshots>
>
>                            <enabled>false</enabled>
>
>                      </snapshots>
>
>                      <id>*Datatorrent*-Releases</id>
>
>                      <name>DataTorrent Release Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/releases/</url>
>
>               </repository>
>
>               <repository>
>
>                      <releases>
>
>                            <enabled>false</enabled>
>
>                      </releases>
>
>                      <id>DataTorrent-Snapshots</id>
>
>                      <name>DataTorrent Early Access Program Snapshot
> Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
>
>               </repository>
>
>        </repositories> -->
>
>
>
>
>
>        <build>
>
>               <plugins>
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-eclipse-*plugin*</
> artifactId>
>
>                            <version>2.9</version>
>
>                            <configuration>
>
>                                   <downloadSources>true</downloadSources>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-compiler-*plugin*</
> artifactId>
>
>                            <version>3.3</version>
>
>                            <configuration>
>
>                                   <encoding>UTF-8</encoding>
>
>                                   <source>1.7</source>
>
>                                   <target>1.7</target>
>
>                                   <debug>true</debug>
>
>                                   <optimize>false</optimize>
>
>                                   <showDeprecation>true</showDeprecation>
>
>                                   <showWarnings>true</showWarnings>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-dependency-*plugin*</
> artifactId>
>
>                            <version>2.8</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-dependencies</id>
>
>                                          <phase>prepare-package</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-dependencies</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> target/deps</outputDirectory>
>
>                                                 <includeScope>runtime</
> includeScope>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-assembly-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>*app*-package-assembly</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>single</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <finalName>
> ${project.artifactId}-${project.version}-apexapp</finalName>
>
>                                                 <appendAssemblyId>false</
> appendAssemblyId>
>
>                                                 <descriptors>
>
>                                                        <descriptor>
> src/assemble/appPackage.xml</descriptor>
>
>                                                 </descriptors>
>
>                                                 <archiverConfig>
>
>                                                        <
> defaultDirectoryMode>0755</defaultDirectoryMode>
>
>                                                 </archiverConfig>
>
>                                                 <archive>
>
>                                                        <manifestEntries>
>
>                                                               <Class-Path>
> ${datatorrent.apppackage.classpath}</Class-Path>
>
>                                                               <
> DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
>
>                                                               <
> DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
>
>                                                               <
> DT-App-Package-Version>${project.version}</DT-App-Package-Version>
>
>                                                               <
> DT-App-Package-Display-Name>${project.name}</DT-App-Package-Display-Name>
>
>                                                               <
> DT-App-Package-Description>${project.description}</
> DT-App-Package-Description>
>
>                                                       </manifestEntries>
>
>                                                 </archive>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-*antrun*-*plugin*</
> artifactId>
>
>                            <version>1.7</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <phase>package</phase>
>
>                                          <configuration>
>
>                                                 <target>
>
>                                                        <move
>
>                                                               file=
> *"${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"*
>
>                                                               tofile=
> *"${project.build.directory}/${project.artifactId}-${project.version}.apa"*
> />
>
>                                                 </target>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                                   <execution>
>
>                                          <!-- create resource directory
> for *xml* *javadoc* -->
>
>                                          <id>createJavadocDirectory</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <configuration>
>
>                                                 <tasks>
>
>                                                        <delete
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                        <mkdir
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                 </tasks>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>build-helper-*maven*-*plugin*</
> artifactId>
>
>                            <version>1.9.1</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>attach-artifacts</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>attach-artifact</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <artifacts>
>
>                                                        <artifact>
>
>                                                               <file>
> target/${project.artifactId}-${project.version}.apa</file>
>
>                                                               <type>apa</
> type>
>
>                                                        </artifact>
>
>                                                 </artifacts>
>
>                                                 <skipAttach>false</
> skipAttach>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <!-- generate *javdoc* -->
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-*javadoc*-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <!-- generate *xml* *javadoc* -->
>
>                                   <execution>
>
>                                          <id>*xml*-*doclet*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>*javadoc*</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <doclet>
> com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
>
>                                                 <additionalparam>-d
>
>
> ${project.build.directory}/generated-resources/xml-javadoc
>
>                                                        -filename
> ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
>
>                                                 <useStandardDocletOptions>
> false</useStandardDocletOptions>
>
>                                                 <docletArtifact>
>
>                                                        <groupId>
> com.github.markusbernhardt</groupId>
>
>                                                        <artifactId>
> xml-doclet</artifactId>
>
>                                                        <version>1.0.4</
> version>
>
>                                                 </docletArtifact>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>                      <!-- Transform *xml* *javadoc* to stripped down
> version containing only class/interface
>
>                            comments and tags -->
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>*xml*-*maven*-*plugin*</artifactId>
>
>                            <version>1.0</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>transform-*xmljavadoc*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>transform</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                            <configuration>
>
>                                   <transformationSets>
>
>                                          <transformationSet>
>
>                                                 <dir>
> ${project.build.directory}/generated-resources/xml-javadoc</dir>
>
>                                                 <includes>
>
>                                                        <include>
> ${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                 </includes>
>
>                                                 <stylesheet>
> XmlJavadocCommentsExtractor.xsl</stylesheet>
>
>                                                 <outputDir>
> ${project.build.directory}/generated-resources/xml-javadoc</outputDir>
>
>                                          </transformationSet>
>
>                                   </transformationSets>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <!-- copy *xml* *javadoc* to class jar -->
>
>                      <plugin>
>
>                            <artifactId>*maven*-resources-*plugin*</
> artifactId>
>
>                            <version>2.6</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-resources</id>
>
>                                          <phase>process-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-resources</goal
> >
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> ${basedir}/target/classes</outputDirectory>
>
>                                                 <resources>
>
>                                                        <resource>
>
>                                                               <directory>
> ${project.build.directory}/generated-resources/xml-javadoc</directory>
>
>                                                               <includes>
>
>                                                                      <
> include>${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                               </includes>
>
>                                                               <filtering>
> true</filtering>
>
>                                                        </resource>
>
>                                                 </resources>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>               </plugins>
>
>
>
>               <pluginManagement>
>
>                      <plugins>
>
>                            <!--This plugin's configuration is used to
> store Eclipse m2e settings
>
>                                   only. It has no influence on the *Maven*
> build itself. -->
>
>                            <plugin>
>
>                                   <groupId>org.eclipse.m2e</groupId>
>
>                                   <artifactId>*lifecycle*-mapping</
> artifactId>
>
>                                   <version>1.0.0</version>
>
>                                   <configuration>
>
>                                          <lifecycleMappingMetadata>
>
>                                                 <pluginExecutions>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId>org.codehaus.mojo</groupId>
>
>                                                                      <
> artifactId>
>
>
> xml-maven-plugin
>
>                                                                      </
> artifactId>
>
>                                                                      <
> versionRange>[1.0,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal>transform</goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId></groupId>
>
>                                                                      <
> artifactId></artifactId>
>
>                                                                      <
> versionRange>[,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal></goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                 </pluginExecutions>
>
>                                          </lifecycleMappingMetadata>
>
>                                   </configuration>
>
>                            </plugin>
>
>                      </plugins>
>
>               </pluginManagement>
>
>        </build>
>
>
>
>        <dependencies>
>
>               <!-- add your dependencies here -->
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-common</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>        <groupId>com.datatorrent</groupId>
>
>        <artifactId>*dt*-engine</artifactId>
>
>        <version>${datatorrent.version}</version>
>
>        <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.datatorrent</groupId>
>
>               <artifactId>*dt*-common</artifactId>
>
>               <version>${datatorrent.version}</version>
>
>               <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>terajdbc4</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>*tdgssconfig*</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.ibm.db2</groupId>
>
>               <artifactId>db2jcc</artifactId>
>
>               <version>123</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>jdk.tools</groupId>
>
>                      <artifactId>jdk.tools</artifactId>
>
>                      <version>1.7</version>
>
>                      <scope>system</scope>
>
>                      <systemPath>C:/Program Files/Java/jdk1.7.0_79/*lib*
> /tools.jar</systemPath>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.apache.apex</groupId>
>
>                      <artifactId>*malhar*-*contrib*</artifactId>
>
>                      <version>3.2.0-incubating</version>
>
>                      <!--<scope>provided</scope> -->
>
>                      <exclusions>
>
>                            <exclusion>
>
>                                   <groupId>*</groupId>
>
>                                   <artifactId>*</artifactId>
>
>                            </exclusion>
>
>                      </exclusions>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>*junit*</groupId>
>
>                      <artifactId>*junit*</artifactId>
>
>                      <version>4.10</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.vertica</groupId>
>
>                      <artifactId>*vertica*-*jdbc*</artifactId>
>
>                      <version>7.2.1-0</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>org.apache.hbase</groupId>
>
>               <artifactId>*hbase*-client</artifactId>
>
>               <version>1.1.2</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.slf4j</groupId>
>
>                      <artifactId>slf4j-log4j12</artifactId>
>
>                      <version>1.7.19</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-engine</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>net.sf.flatpack</groupId>
>
>                      <artifactId>*flatpack*</artifactId>
>
>                      <version>3.4.2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.jdom</groupId>
>
>                      <artifactId>*jdom*</artifactId>
>
>                      <version>1.1.3</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.xmlbeans</groupId>
>
>                      <artifactId>*xmlbeans*</artifactId>
>
>                      <version>2.3.0</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>dom4j</groupId>
>
>                      <artifactId>dom4j</artifactId>
>
>                      <version>1.6.1</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>javax.xml.stream</groupId>
>
>                      <artifactId>*stax*-*api*</artifactId>
>
>                      <version>1.0-2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*-schemas</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>        </dependencies>
>
>
>
> </project>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 4:37 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> It looks like you may be including old Hadoop jars in your apa package
> since the stack trace
>
> shows *ConverterUtils.toContainerId* calling
> *ConverterUtils.toApplicationAttemptId* but recent versions
>
> don't have that call sequence. In 2.7.1 (which is what your cluster has)
> the function looks like this:
>
>  * public static ContainerId toContainerId(String containerIdStr) {*
>
> *    return ContainerId.fromString(containerIdStr);*
>
> *  }*
>
>
>
> Could you post the output of "*jar tvf {your-apa-file}*" as well as: "*mvn
> dependency:tree"*
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below is the information.
>
>
>
>
>
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>
>                                  Dload  Upload   Total   Spent    Left
> Speed
>
> 100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--
> 3807
>
> {
>
>     "clusterInfo": {
>
>         "haState": "ACTIVE",
>
>         "haZooKeeperConnectionState": "CONNECTED",
>
>         "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa98205f18bc
>
> caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
>
>         "hadoopVersion": "2.7.1.2.3.2.0-2950",
>
>         "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
>
>         "id": 1465495186350,
>
>         "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa9
>
> 8205f18bccaeaf36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
>         "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>        "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
>         "rmStateStoreName":
> "org.apache.hadoop.yarn.server.resourcemanager.recov
>
> ery.ZKRMStateStore",
>
>         "startedOn": 1465495186350,
>
>         "state": "STARTED"
>
>     }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 2:57 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Can you ssh to one of the cluster nodes ? If so, can you run this command
> and show the output
>
> (where *{rm} *is the *host:port* running the resource manager, aka YARN):
>
>
>
> *curl http://{rm}/ws/v1/cluster <http://%7brm%7d/ws/v1/cluster> | python
> -mjson.tool*
>
>
>
> Ram
>
> ps. You can determine the node running YARN with:
>
>
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.address*
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*
>
>
>
>
>
>
>
> On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> èWe have 120 directories to scan on HDFS, if we use parallel partitioning
> with operator memory around 250MB , it might be around 30GB of RAM for the
> processing of this operator, are these figures going to create any problem
> in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> èShould I use a scheduler for running the batch job (or) define next scan
> time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

Re: Multiple directories

Posted by Thomas Weise <th...@gmail.com>.
I don't see the streams named "data" and "control" in your populateDAG.

There are only 2 operators here but in the log I see 4 container requests.

How are these operators partitioned?

On Mon, Jun 20, 2016 at 7:33 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi Thomas,
>
>
>
> Below is my populate DAG code.
>
>
>
> @ApplicationAnnotation(name="ClientGffCreation")
>
> public class ClientGffApplication implements StreamingApplication
>
> {
>
>
>
>   @Override
>
>   public void populateDAG(DAG dag, Configuration conf)
>
>   {
>
>     // create operators
>
>     FileReader reader = dag.addOperator("read",  FileReader.class);
>
>     FileOutputOperator writer = dag.addOperator("write",
> FileOutputOperator.class);
>
>
>
>     reader.setScanner(new FileReaderMultiDir.SlicedDirectoryScanner());
>
>     // create DAG
>
>     dag.addStream("File-Writer", reader.output, writer.input);
>
>   }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> suryavamshivardhan.mukkamula@rbc.com]
> *Sent:* 2016, June, 20 10:04 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi Thomas,
>
>
>
> Though I requested for 500MB, Master Log shows that it is requesting for
> 4GB container. My properties are being ignored. Please suggest.
>
>
>
> 2016-06-17 15:43:19,174 INFO  security.StramWSFilter
> (StramWSFilter.java:doFilter(157)) - 10.60.39.21: proxy access to URI
> /ws/v2/stram/info by user dr.who, no cookie created
>
> 2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp005.saifg.rbc.com:45454,numContainers=7,capability=<memory:245760,
> vCores:40>used=<memory:73728, vCores:11>state=RUNNING
>
> 2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp008.saifg.rbc.com:45454,numContainers=6,capability=<memory:245760,
> vCores:40>used=<memory:81920, vCores:11>state=RUNNING
>
> 2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager
> (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at
> operator write 130
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp006.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760,
> vCores:40>used=<memory:57344, vCores:8>state=RUNNING
>
> 2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager
> (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at
> operator read 130
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp007.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760,
> vCores:40>used=<memory:86016, vCores:10>state=RUNNING
>
> 2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager
> (StreamingContainerManager.java:getAppDataSources(620)) - DEBUG: looking at
> port output 130
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp004.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760,
> vCores:40>used=<memory:49152, vCores:6>state=RUNNING
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp012.saifg.rbc.com:45454,numContainers=7,capability=<memory:204800,
> vCores:39>used=<memory:65536, vCores:10>state=RUNNING
>
> 2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp011.saifg.rbc.com:45454,numContainers=5,capability=<memory:204800,
> vCores:39>used=<memory:77824, vCores:9>state=RUNNING
>
> 2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp010.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760,
> vCores:40>used=<memory:36864, vCores:5>state=RUNNING
>
> 2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler
> (ResourceRequestHandler.java:updateNodeReports(105)) - Node report:
> rackName=/default-rack,nodeid=guedlpdhdp009.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760,
> vCores:40>used=<memory:77824, vCores:9>state=RUNNING
>
> 2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1048)) - Asking RM for
> containers: [Capability[<memory:4096, vCores:1>]Priority[0],
> Capability[<memory:4096, vCores:1>]Priority[1], Capability[<memory:4096,
> vCores:1>]Priority[2], Capability[<memory:4096, vCores:1>]Priority[3]]
>
> 2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[0]
>
> 2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[1]
>
> 2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[2]
>
> 2016-06-17 15:43:20,186 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested
> container: Capability[<memory:4096, vCores:1>]Priority[3]
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp006.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp004.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp012.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl
> (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for :
> guedlpdhdp009.saifg.rbc.com:45454
>
> 2016-06-17 15:43:21,215 INFO  stram.StreamingAppMasterService
> (StreamingAppMasterService.java:execute(822)) - Got new container.,
> containerId=container_e35_1465495186350_2632_01_000002, containerNode=
> guedlpdhdp006.saifg.rbc.com:45454, containerNodeURI=
> guedlpdhdp006.saifg.rbc.com:8042, containerResourceMemory4096, priority0
>
> 2016-06-17 15:43:21,226 INFO
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating
> password for identifier: owner=mukkamula, renewer=, realUser=,
> issueDate=1466192601226, maxDate=4611687484619989129, sequenceNumber=1,
> masterKeyId=2, currentKey: 2
>
> 2016-06-17 15:43:21,228 INFO  stram.LaunchContainerRunnable
> (LaunchContainerRunnable.java:run(143)) - Setting up container launch
> context for containerid=container_e35_1465495186350_2632_01_000002
>
> 2016-06-17 15:43:21,233 INFO  stram.LaunchContainerRunnable
> (LaunchContainerRunnable.java:setClasspath(118)) - CLASSPATH:
> ./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
>
> 2016-06-17 15:43:21,323 INFO  util.BasicContainerOptConfigurator
> (BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for
> operator {Generic=null, -Xmx=768m}
>
> 2016-06-17 15:43:21,323 INFO  stram.LaunchContainerRunnable
> (LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts
> -Xmx939524096  for container container_e35_1465495186350_2632_01_000002
>
> 2016
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 18 12:03 AM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Please check in the Apex application master log (container 1), how much
> memory it is requesting. If that's the correct figure and still you end up
> with a larger container, the problem could be the minimum container size in
> the YARN scheduler configuration.
>
>
>
>
>
> On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> I tried that option of adding the memory properties in site/conf and
> selected during the launch , but no luck. The same is working with my local
> sandbox set up.
>
>
>
> Is there any other way that I can understand the reason?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 17 3:06 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Please take a look at the section entitled "Properties source precedence"
> at
>
> http://docs.datatorrent.com/application_packages/
>
>
>
> It looks like the setting in dt-site.xml on the cluster is overriding your
> application defined values.
>
> If you add the properties to file under site/conf in your application and
> then
>
> select it during launch, those values should take effect.
>
>
>
> For signalling EOF, another option is to use a separate control port to
> send the EOF which could
>
> just be the string "EOF" for example.
>
>
>
>
>
> Ram
>
>
>
> On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I found a way for the 2nd question.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
>
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.(Solution: It seems readEntity() method if returns null
> emitTuple() method is not getting called, I have managed to emit the null
> object from readEntity() itself)
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> suryavamshivardhan.mukkamula@rbc.com]
> *Sent:* 2016, June, 17 12:20 PM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi,
>
>
>
> Can you please help me understand the below issues.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.
>
>
>
> ################### File Reader
> ####################################################
>
>
>
> @Override
>
>                 protected String readEntity() throws IOException {
>
>                                 // try to read a line
>
>                                 final String line = br.readLine();
>
>                                 if (null != line) { // normal case
>
>                                                 LOG.debug("readEntity:
> line = {}", line);
>
>                                                 return line;
>
>                                 }
>
>
>
>                                 // end-of-file (control tuple sent in
> closeFile()
>
>                                 LOG.info("readEntity: EOF for {}",
> filePath);
>
>                                 return null;
>
>                 }
>
>
>
>                 @Override
>
>                 protected void emit(String line) {
>
>                                 // parsing logic here, parse the line as
> per the input configuration and
>
>                                 // create the output line as per the
> output configuration
>
>                                 if(line == null){
>
>                                                 output.emit(new
> KeyValue<String,String>(getFileName(),null));
>
>                                 }
>
>                                 KeyValue<String, String> tuple = new
> KeyValue<String, String>();
>
>                                 tuple.key = getFileName();
>
>                                 tuple.value = line;
>
>                                 KeyValue<String, String> newTuple =
> parseTuple(tuple);
>
>                                 output.emit(newTuple);
>
>                 }
>
> ######################File
> Writer######################################################
>
> public class FileOutputOperator extends
> AbstractFileOutputOperator<KeyValue<String, String>> {
>
>     private static final Logger LOG =
> LoggerFactory.getLogger(FileOutputOperator.class);
>
>     private List<String> filesToFinalize = new ArrayList<>();
>
>
>
>     @Override
>
>     public void setup(Context.OperatorContext context) {
>
>         super.setup(context);
>
>         finalizeFiles();
>
>     }
>
>
>
>     @Override
>
>     protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
>
>         if (tuple.value == null) {
>
>                 LOG.info("File to finalize {}",tuple.key);
>
>             filesToFinalize.add(tuple.key);
>
>             return new byte[0];
>
>         }
>
>         else {
>
>             return tuple.value.getBytes();
>
>         }
>
>     }
>
>
>
>     @Override
>
>     protected String getFileName(KeyValue<String, String> tuple) {
>
>         return tuple.key;
>
>     }
>
>
>
>     @Override
>
>     public void endWindow() {
>
>         super.endWindow();
>
>         finalizeFiles();
>
>     }
>
>
>
>     private void finalizeFiles() {
>
>                 LOG.info("Files to finalize {}",filesToFinalize.toArray());
>
>         Iterator<String> fileIt = filesToFinalize.iterator();
>
>         while(fileIt.hasNext()) {
>
>             requestFinalize(fileIt.next());
>
>             fileIt.remove();
>
>         }
>
>     }
>
> }
>
>
> ##################################################################################################
>
>
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [
> mailto:suryavamshivardhan.mukkamula@rbc.com
> <su...@rbc.com>]
> *Sent:* 2016, June, 17 9:11 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi Ram/Raja,
>
>
>
> Hbase dependency was creating older Hadoop jars in my classpath. I removed
> the Hbase dependency which I don’t need for now and issue got resolved.
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Raja.Aravapalli [mailto:Raja.Aravapalli@target.com
> <Ra...@target.com>]
> *Sent:* 2016, June, 17 7:06 AM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> I also, faced similar problem with Hadoop jars, when I used HBase jars in
> pom.xml!! It could be because of version of hadoop jars your apa is holding
> different than the one’s in the cluster!!
>
>
>
>
>
> What I did to solve this is,
>
>
>
> Included the scope provided in Maven pom.xml for hbase jars, and then
> provided the hbase jars to application package during submission using
>  "-libjars” with launch command. Which solved my Invalid Container Id
> problem!!
>
>
>
> You can type “launch help” for learning on usage details.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *Munagala Ramanath <ra...@datatorrent.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Thursday, June 16, 2016 at 4:10 PM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Subject: *Re: Multiple directories
>
>
>
> Those 6 hadoop jars are definitely a problem.
>
>
>
> I didn't see the output of "*mvn dependency:tree*"; could you post that ?
>
> It will show you why these hadoop jars are being pulled in.
>
>
>
> Also, please refer to the section "Hadoop dependencies conflicts" in the
> troubleshooting guide:
>
> http://docs.datatorrent.com/troubleshooting/
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below are the details.
>
>
>
>
>
> 0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
>
>    358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
>
>      0 Wed Jun 15 16:34:24 EDT 2016 app/
>
> 52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
>
>      0 Wed Jun 15 16:34:22 EDT 2016 lib/
>
> 62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
>
> 1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
>
>   4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
>
> 44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
>
> 691479 Wed Jun 15 16:34:22 EDT 2016
> lib/apacheds-kerberos-codec-2.0.0-M15.jar
>
> 16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
>
> 79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
>
> 43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
>
> 303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
>
> 232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
>
> 41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
>
> 284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
>
> 575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
>
> 30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
>
> 241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
>
> 298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
>
> 143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
>
> 112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
>
> 305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
>
> 185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
>
> 284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
>
> 315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
>
> 61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
>
> 1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
>
> 273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
>
> 3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
>
> 313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
>
> 17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
>
> 15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
>
> 84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
>
> 20220 Wed Jun 15 16:34:22 EDT 2016
> lib/geronimo-j2ee-management_1.1_spec-1.0.1
>
> jar
>
> 32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
>
> 21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
>
> 690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
>
> 253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
>
> 198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
>
> 336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
>
>   8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
>
> 1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
>
> 710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
>
> 65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
>
> 16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
>
> 52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
>
> 2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
>
> 1498368 Wed Jun 15 16:34:22 EDT 2016
> lib/hadoop-mapreduce-client-core-2.5.1.jar
>
> 1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
>
> 1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
>
> 50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
>
> 20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
>
> 1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
>
> 530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
>
> 4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
>
> 1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
>
> 590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
>
> 282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
>
> 228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
>
> 765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
>
>   2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
>
> 521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
>
> 83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
>
> 85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
>
> 105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
>
> 890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
>
> 1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
>
> 151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
>
> 130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
>
> 458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
>
> 17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
>
> 14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
>
> 147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
>
> 713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
>
> 28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
>
> 12976 Wed Jun 15 16:34:22 EDT 2016
> lib/jersey-test-framework-grizzly2-1.9.jar
>
> 67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
>
> 21144 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-continuation-8.1.10.v20130312.jar
>
> 95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
>
> 103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
>
> 89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
>
> 347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
>
> 101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
>
> 177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
>
> 284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
>
> 125928 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-websocket-8.1.10.v20130312.jar
>
> 39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
>
> 187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
>
> 280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
>
> 33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
>
> 489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
>
> 565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
>
> 1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
>
> 42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
>
>   8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
>
> 1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
>
> 1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
>
> 29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
>
> 1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
>
> 936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
>
> 4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
>
> 533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
>
> 26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
>
>   9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
>
> 995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
>
> 23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
>
> 26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
>
>   2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
>
> 991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
>
> 758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
>
> 109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
>
> 2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
>
> 15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
>
> 94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
>
> 792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
>
>      0 Wed Jun 15 16:34:28 EDT 2016 conf/
>
>    334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
>
>   3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml
>
>
>
> <?xmlversion=*"1.0"*encoding=*"UTF-8"*?>
>
> <projectxmlns=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>"*xmlns:xsi=*"http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
>
>        xsi:schemaLocation=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>http://maven.apache.org/xsd/maven-4.0.0.xsd
> <http://maven.apache.org/xsd/maven-4.0.0.xsd>"*>
>
>        <modelVersion>4.0.0</modelVersion>
>
>        <groupId>com.rbc.aml.cnscan</groupId>
>
>        <version>1.0-SNAPSHOT</version>
>
>        <artifactId>*countrynamescan*</artifactId>
>
>        <packaging>jar</packaging>
>
>
>
>        <!-- change these to the appropriate values -->
>
>        <name>*countrynamescan*</name>
>
>        <description>Country and Name Scan project</description>
>
>
>
>        <properties>
>
>               <!-- change this if you desire to use a different version
> of DataTorrent -->
>
>               <datatorrent.version>3.1.1</datatorrent.version>
>
>               <datatorrent.apppackage.classpath>lib/*.jar</
> datatorrent.apppackage.classpath>
>
>        </properties>
>
>
>
>        <!-- repository to provide the DataTorrent artifacts -->
>
>        <!-- <repositories>
>
>               <repository>
>
>                      <snapshots>
>
>                            <enabled>false</enabled>
>
>                      </snapshots>
>
>                      <id>*Datatorrent*-Releases</id>
>
>                      <name>DataTorrent Release Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/releases/</url>
>
>               </repository>
>
>               <repository>
>
>                      <releases>
>
>                            <enabled>false</enabled>
>
>                      </releases>
>
>                      <id>DataTorrent-Snapshots</id>
>
>                      <name>DataTorrent Early Access Program Snapshot
> Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
>
>               </repository>
>
>        </repositories> -->
>
>
>
>
>
>        <build>
>
>               <plugins>
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-eclipse-*plugin*</
> artifactId>
>
>                            <version>2.9</version>
>
>                            <configuration>
>
>                                   <downloadSources>true</downloadSources>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-compiler-*plugin*</
> artifactId>
>
>                            <version>3.3</version>
>
>                            <configuration>
>
>                                   <encoding>UTF-8</encoding>
>
>                                   <source>1.7</source>
>
>                                   <target>1.7</target>
>
>                                   <debug>true</debug>
>
>                                   <optimize>false</optimize>
>
>                                   <showDeprecation>true</showDeprecation>
>
>                                   <showWarnings>true</showWarnings>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-dependency-*plugin*</
> artifactId>
>
>                            <version>2.8</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-dependencies</id>
>
>                                          <phase>prepare-package</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-dependencies</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> target/deps</outputDirectory>
>
>                                                 <includeScope>runtime</
> includeScope>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-assembly-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>*app*-package-assembly</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>single</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <finalName>
> ${project.artifactId}-${project.version}-apexapp</finalName>
>
>                                                 <appendAssemblyId>false</
> appendAssemblyId>
>
>                                                 <descriptors>
>
>                                                        <descriptor>
> src/assemble/appPackage.xml</descriptor>
>
>                                                 </descriptors>
>
>                                                 <archiverConfig>
>
>                                                        <
> defaultDirectoryMode>0755</defaultDirectoryMode>
>
>                                                 </archiverConfig>
>
>                                                 <archive>
>
>                                                        <manifestEntries>
>
>                                                               <Class-Path>
> ${datatorrent.apppackage.classpath}</Class-Path>
>
>                                                               <
> DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
>
>                                                               <
> DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
>
>                                                               <
> DT-App-Package-Version>${project.version}</DT-App-Package-Version>
>
>                                                               <
> DT-App-Package-Display-Name>${project.name}</DT-App-Package-Display-Name>
>
>                                                               <
> DT-App-Package-Description>${project.description}</
> DT-App-Package-Description>
>
>                                                       </manifestEntries>
>
>                                                 </archive>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-*antrun*-*plugin*</
> artifactId>
>
>                            <version>1.7</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <phase>package</phase>
>
>                                          <configuration>
>
>                                                 <target>
>
>                                                        <move
>
>                                                               file=
> *"${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"*
>
>                                                               tofile=
> *"${project.build.directory}/${project.artifactId}-${project.version}.apa"*
> />
>
>                                                 </target>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                                   <execution>
>
>                                          <!-- create resource directory
> for *xml* *javadoc* -->
>
>                                          <id>createJavadocDirectory</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <configuration>
>
>                                                 <tasks>
>
>                                                        <delete
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                        <mkdir
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                 </tasks>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>build-helper-*maven*-*plugin*</
> artifactId>
>
>                            <version>1.9.1</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>attach-artifacts</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>attach-artifact</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <artifacts>
>
>                                                        <artifact>
>
>                                                               <file>
> target/${project.artifactId}-${project.version}.apa</file>
>
>                                                               <type>apa</
> type>
>
>                                                        </artifact>
>
>                                                 </artifacts>
>
>                                                 <skipAttach>false</
> skipAttach>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <!-- generate *javdoc* -->
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-*javadoc*-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <!-- generate *xml* *javadoc* -->
>
>                                   <execution>
>
>                                          <id>*xml*-*doclet*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>*javadoc*</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <doclet>
> com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
>
>                                                 <additionalparam>-d
>
>
> ${project.build.directory}/generated-resources/xml-javadoc
>
>                                                        -filename
> ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
>
>                                                 <useStandardDocletOptions>
> false</useStandardDocletOptions>
>
>                                                 <docletArtifact>
>
>                                                        <groupId>
> com.github.markusbernhardt</groupId>
>
>                                                        <artifactId>
> xml-doclet</artifactId>
>
>                                                        <version>1.0.4</
> version>
>
>                                                 </docletArtifact>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>                      <!-- Transform *xml* *javadoc* to stripped down
> version containing only class/interface
>
>                            comments and tags -->
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>*xml*-*maven*-*plugin*</artifactId>
>
>                            <version>1.0</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>transform-*xmljavadoc*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>transform</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                            <configuration>
>
>                                   <transformationSets>
>
>                                          <transformationSet>
>
>                                                 <dir>
> ${project.build.directory}/generated-resources/xml-javadoc</dir>
>
>                                                 <includes>
>
>                                                        <include>
> ${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                 </includes>
>
>                                                 <stylesheet>
> XmlJavadocCommentsExtractor.xsl</stylesheet>
>
>                                                 <outputDir>
> ${project.build.directory}/generated-resources/xml-javadoc</outputDir>
>
>                                          </transformationSet>
>
>                                   </transformationSets>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <!-- copy *xml* *javadoc* to class jar -->
>
>                      <plugin>
>
>                            <artifactId>*maven*-resources-*plugin*</
> artifactId>
>
>                            <version>2.6</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-resources</id>
>
>                                          <phase>process-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-resources</goal
> >
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> ${basedir}/target/classes</outputDirectory>
>
>                                                 <resources>
>
>                                                        <resource>
>
>                                                               <directory>
> ${project.build.directory}/generated-resources/xml-javadoc</directory>
>
>                                                               <includes>
>
>                                                                      <
> include>${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                               </includes>
>
>                                                               <filtering>
> true</filtering>
>
>                                                        </resource>
>
>                                                 </resources>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>               </plugins>
>
>
>
>               <pluginManagement>
>
>                      <plugins>
>
>                            <!--This plugin's configuration is used to
> store Eclipse m2e settings
>
>                                   only. It has no influence on the *Maven*
> build itself. -->
>
>                            <plugin>
>
>                                   <groupId>org.eclipse.m2e</groupId>
>
>                                   <artifactId>*lifecycle*-mapping</
> artifactId>
>
>                                   <version>1.0.0</version>
>
>                                   <configuration>
>
>                                          <lifecycleMappingMetadata>
>
>                                                 <pluginExecutions>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId>org.codehaus.mojo</groupId>
>
>                                                                      <
> artifactId>
>
>
> xml-maven-plugin
>
>                                                                      </
> artifactId>
>
>                                                                      <
> versionRange>[1.0,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal>transform</goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId></groupId>
>
>                                                                      <
> artifactId></artifactId>
>
>                                                                      <
> versionRange>[,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal></goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                 </pluginExecutions>
>
>                                          </lifecycleMappingMetadata>
>
>                                   </configuration>
>
>                            </plugin>
>
>                      </plugins>
>
>               </pluginManagement>
>
>        </build>
>
>
>
>        <dependencies>
>
>               <!-- add your dependencies here -->
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-common</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>        <groupId>com.datatorrent</groupId>
>
>        <artifactId>*dt*-engine</artifactId>
>
>        <version>${datatorrent.version}</version>
>
>        <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.datatorrent</groupId>
>
>               <artifactId>*dt*-common</artifactId>
>
>               <version>${datatorrent.version}</version>
>
>               <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>terajdbc4</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>*tdgssconfig*</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.ibm.db2</groupId>
>
>               <artifactId>db2jcc</artifactId>
>
>               <version>123</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>jdk.tools</groupId>
>
>                      <artifactId>jdk.tools</artifactId>
>
>                      <version>1.7</version>
>
>                      <scope>system</scope>
>
>                      <systemPath>C:/Program Files/Java/jdk1.7.0_79/*lib*
> /tools.jar</systemPath>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.apache.apex</groupId>
>
>                      <artifactId>*malhar*-*contrib*</artifactId>
>
>                      <version>3.2.0-incubating</version>
>
>                      <!--<scope>provided</scope> -->
>
>                      <exclusions>
>
>                            <exclusion>
>
>                                   <groupId>*</groupId>
>
>                                   <artifactId>*</artifactId>
>
>                            </exclusion>
>
>                      </exclusions>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>*junit*</groupId>
>
>                      <artifactId>*junit*</artifactId>
>
>                      <version>4.10</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.vertica</groupId>
>
>                      <artifactId>*vertica*-*jdbc*</artifactId>
>
>                      <version>7.2.1-0</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>org.apache.hbase</groupId>
>
>               <artifactId>*hbase*-client</artifactId>
>
>               <version>1.1.2</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.slf4j</groupId>
>
>                      <artifactId>slf4j-log4j12</artifactId>
>
>                      <version>1.7.19</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-engine</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>net.sf.flatpack</groupId>
>
>                      <artifactId>*flatpack*</artifactId>
>
>                      <version>3.4.2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.jdom</groupId>
>
>                      <artifactId>*jdom*</artifactId>
>
>                      <version>1.1.3</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.xmlbeans</groupId>
>
>                      <artifactId>*xmlbeans*</artifactId>
>
>                      <version>2.3.0</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>dom4j</groupId>
>
>                      <artifactId>dom4j</artifactId>
>
>                      <version>1.6.1</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>javax.xml.stream</groupId>
>
>                      <artifactId>*stax*-*api*</artifactId>
>
>                      <version>1.0-2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*-schemas</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>        </dependencies>
>
>
>
> </project>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 4:37 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> It looks like you may be including old Hadoop jars in your apa package
> since the stack trace
>
> shows *ConverterUtils.toContainerId* calling
> *ConverterUtils.toApplicationAttemptId* but recent versions
>
> don't have that call sequence. In 2.7.1 (which is what your cluster has)
> the function looks like this:
>
>  * public static ContainerId toContainerId(String containerIdStr) {*
>
> *    return ContainerId.fromString(containerIdStr);*
>
> *  }*
>
>
>
> Could you post the output of "*jar tvf {your-apa-file}*" as well as: "*mvn
> dependency:tree"*
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below is the information.
>
>
>
>
>
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>
>                                  Dload  Upload   Total   Spent    Left
> Speed
>
> 100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--
> 3807
>
> {
>
>     "clusterInfo": {
>
>         "haState": "ACTIVE",
>
>         "haZooKeeperConnectionState": "CONNECTED",
>
>         "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa98205f18bc
>
> caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
>
>         "hadoopVersion": "2.7.1.2.3.2.0-2950",
>
>         "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
>
>         "id": 1465495186350,
>
>         "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa9
>
> 8205f18bccaeaf36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
>         "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>        "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
>         "rmStateStoreName":
> "org.apache.hadoop.yarn.server.resourcemanager.recov
>
> ery.ZKRMStateStore",
>
>         "startedOn": 1465495186350,
>
>         "state": "STARTED"
>
>     }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 2:57 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Can you ssh to one of the cluster nodes ? If so, can you run this command
> and show the output
>
> (where *{rm} *is the *host:port* running the resource manager, aka YARN):
>
>
>
> *curl http://{rm}/ws/v1/cluster <http://%7brm%7d/ws/v1/cluster> | python
> -mjson.tool*
>
>
>
> Ram
>
> ps. You can determine the node running YARN with:
>
>
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.address*
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*
>
>
>
>
>
>
>
> On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> èWe have 120 directories to scan on HDFS, if we use parallel partitioning
> with operator memory around 250MB , it might be around 30GB of RAM for the
> processing of this operator, are these figures going to create any problem
> in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> èShould I use a scheduler for running the batch job (or) define next scan
> time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Thomas,

My bad, I was using different stram name from previous code, I have corrected it. Now it is 2 containers each taking 4GB RAM.


<?xml version="1.0"?>
<configuration>

  <property>
   <name>dt.application.ClientGffCreation.operator.read.attr.MEMORY_MB</name>
   <value>500</value>
  </property>

  <property>
   <name>dt.application.ClientGffCreation.operator.write.attr.MEMORY_MB</name>
   <value>500</value>
  </property>

  <property>
    <name>dt.application.ClientGffCreation.operator.*.port.*.attr.BUFFER_MEMORY_MB</name>
    <value>128</value>
  </property>

  <property>
   <name>dt.application.ClientGffCreation.stream.File-Writer.locality</name>
   <value>CONTAINER_LOCAL</value>
  </property>

  <property>
    <name>dt.loggers.level</name>
    <value>com.datatorrent.*:INFO,org.apache.*:INFO</value>
  </property>

  <!--
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.directory</name>
    <value>/tmp/fileIO/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.write.prop.filePath</name>
    <value>/tmp/fileIO/output</value>
  </property>
   -->

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.directory</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.write.prop.filePath</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>

  <!-- Source : 804 properties
   <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(804)</name>
    <value>tmp/fileIO/804/input</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(804)</name>
    <value>tmp/fileIO/804/config/804_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(804)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(804)</name>
    <value>tmp/fileIO/804/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(804)</name>
    <value>tmp/fileIO/804/config/804_output_config.xml</value>
  </property>
  -->

   <!-- Source : 805 properties
   <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(805)</name>
    <value>tmp/fileIO/805/input</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(805)</name>
    <value>tmp/fileIO/805/config/805_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(805)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(805)</name>
    <value>tmp/fileIO/805/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(805)</name>
    <value>tmp/fileIO/805/config/805_output_config.xml</value>
  </property>
  -->

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(804)</name>
    <value>/dev/up20/data/dm/804/2016/05/16</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(804)</name>
    <value>/user/mukkamula/cnscan/config/804/804_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(804)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(804)</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(804)</name>
    <value>/user/mukkamula/cnscan/config/804/804_output_config.xml</value>
  </property>

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(600)</name>
    <value>/dev/up20/data/dm/600/2016/05/20</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(600)</name>
    <value>/user/mukkamula/cnscan/config/600/600_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(600)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(600)</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(600)</name>
    <value>/user/mukkamula/cnscan/config/600/600_output_config.xml</value>
  </property>

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.fileNamePattern</name>
    <value>$[source]_$[yyyy]_$[MM]_$[dd].dat</value>
    </property>
</configuration>


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 20 12:18 PM
To: users@apex.apache.org
Subject: RE: Multiple directories

Hi Ram/Thomas,

What I am trying to do is,

We have around 120 different source directories(total of  approx. 5 million records) from which we have to read the files, each input file is defined by input a configuration file. With the help of multiple directory read code(provided by ram), I have enhanced it to read the input directories in parallel by partitioning separate input directory and input configuration file for each partition and I could parse and generate the output file in separate directories.

When I set the operator memory is around 500MB , which might end up around 60GB RAM when the batch runs. It might have been ok. But now I cannot limit the operator memory , I am not sure how to proceed.

Also please suggest if there is a better way to design.

Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 20 10:34 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Thomas,

Below is my populate DAG code.

@ApplicationAnnotation(name="ClientGffCreation")
public class ClientGffApplication implements StreamingApplication
{

  @Override
  public void populateDAG(DAG dag, Configuration conf)
  {
    // create operators
    FileReader reader = dag.addOperator("read",  FileReader.class);
    FileOutputOperator writer = dag.addOperator("write", FileOutputOperator.class);

    reader.setScanner(new FileReaderMultiDir.SlicedDirectoryScanner());
    // create DAG
    dag.addStream("File-Writer", reader.output, writer.input);
  }
}

Regards,
Surya Vamshi
From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 20 10:04 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Thomas,

Though I requested for 500MB, Master Log shows that it is requesting for 4GB container. My properties are being ignored. Please suggest.

2016-06-17 15:43:19,174 INFO  security.StramWSFilter (StramWSFilter.java:doFilter(157)) - 10.60.39.21: proxy access to URI /ws/v2/stram/info by user dr.who, no cookie created
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp005.saifg.rbc.com:45454,numContainers=7,capability=<memory:245760, vCores:40>used=<memory:73728, vCores:11>state=RUNNING
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp008.saifg.rbc.com:45454,numContainers=6,capability=<memory:245760, vCores:40>used=<memory:81920, vCores:11>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator write 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp006.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:57344, vCores:8>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator read 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp007.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:86016, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(620)) - DEBUG: looking at port output 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp004.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:49152, vCores:6>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp012.saifg.rbc.com:45454,numContainers=7,capability=<memory:204800, vCores:39>used=<memory:65536, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp011.saifg.rbc.com:45454,numContainers=5,capability=<memory:204800, vCores:39>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp010.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:36864, vCores:5>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp009.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1048)) - Asking RM for containers: [Capability[<memory:4096, vCores:1>]Priority[0], Capability[<memory:4096, vCores:1>]Priority[1], Capability[<memory:4096, vCores:1>]Priority[2], Capability[<memory:4096, vCores:1>]Priority[3]]
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[0]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[1]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[2]
2016-06-17 15:43:20,186 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[3]
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp006.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp004.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp012.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp009.saifg.rbc.com:45454
2016-06-17 15:43:21,215 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:execute(822)) - Got new container., containerId=container_e35_1465495186350_2632_01_000002, containerNode=guedlpdhdp006.saifg.rbc.com:45454, containerNodeURI=guedlpdhdp006.saifg.rbc.com:8042, containerResourceMemory4096, priority0
2016-06-17 15:43:21,226 INFO  delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating password for identifier: owner=mukkamula, renewer=, realUser=, issueDate=1466192601226, maxDate=4611687484619989129, sequenceNumber=1, masterKeyId=2, currentKey: 2
2016-06-17 15:43:21,228 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:run(143)) - Setting up container launch context for containerid=container_e35_1465495186350_2632_01_000002
2016-06-17 15:43:21,233 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:setClasspath(118)) - CLASSPATH: ./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
2016-06-17 15:43:21,323 INFO  util.BasicContainerOptConfigurator (BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for operator {Generic=null, -Xmx=768m}
2016-06-17 15:43:21,323 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts  -Xmx939524096  for container container_e35_1465495186350_2632_01_000002
2016


Regards,
Surya Vamshi

From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 18 12:03 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please check in the Apex application master log (container 1), how much memory it is requesting. If that's the correct figure and still you end up with a larger container, the problem could be the minimum container size in the YARN scheduler configuration.


On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

I tried that option of adding the memory properties in site/conf and selected during the launch , but no luck. The same is working with my local sandbox set up.

Is there any other way that I can understand the reason?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 17 3:06 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please take a look at the section entitled "Properties source precedence" at
http://docs.datatorrent.com/application_packages/

It looks like the setting in dt-site.xml on the cluster is overriding your application defined values.
If you add the properties to file under site/conf in your application and then
select it during launch, those values should take effect.

For signalling EOF, another option is to use a separate control port to send the EOF which could
just be the string "EOF" for example.


Ram

On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I found a way for the 2nd question.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)



2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.(Solution: It seems readEntity() method if returns null emitTuple() method is not getting called, I have managed to emit the null object from readEntity() itself)


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com<ma...@rbc.com>]
Sent: 2016, June, 17 12:20 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.
_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram/Thomas,

What I am trying to do is,

We have around 120 different source directories(total of  approx. 5 million records) from which we have to read the files, each input file is defined by input a configuration file. With the help of multiple directory read code(provided by ram), I have enhanced it to read the input directories in parallel by partitioning separate input directory and input configuration file for each partition and I could parse and generate the output file in separate directories.

When I set the operator memory is around 500MB , which might end up around 60GB RAM when the batch runs. It might have been ok. But now I cannot limit the operator memory , I am not sure how to proceed.

Also please suggest if there is a better way to design.

Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 20 10:34 AM
To: users@apex.apache.org
Subject: RE: Multiple directories

Hi Thomas,

Below is my populate DAG code.

@ApplicationAnnotation(name="ClientGffCreation")
public class ClientGffApplication implements StreamingApplication
{

  @Override
  public void populateDAG(DAG dag, Configuration conf)
  {
    // create operators
    FileReader reader = dag.addOperator("read",  FileReader.class);
    FileOutputOperator writer = dag.addOperator("write", FileOutputOperator.class);

    reader.setScanner(new FileReaderMultiDir.SlicedDirectoryScanner());
    // create DAG
    dag.addStream("File-Writer", reader.output, writer.input);
  }
}

Regards,
Surya Vamshi
From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 20 10:04 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Thomas,

Though I requested for 500MB, Master Log shows that it is requesting for 4GB container. My properties are being ignored. Please suggest.

2016-06-17 15:43:19,174 INFO  security.StramWSFilter (StramWSFilter.java:doFilter(157)) - 10.60.39.21: proxy access to URI /ws/v2/stram/info by user dr.who, no cookie created
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp005.saifg.rbc.com:45454,numContainers=7,capability=<memory:245760, vCores:40>used=<memory:73728, vCores:11>state=RUNNING
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp008.saifg.rbc.com:45454,numContainers=6,capability=<memory:245760, vCores:40>used=<memory:81920, vCores:11>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator write 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp006.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:57344, vCores:8>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator read 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp007.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:86016, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(620)) - DEBUG: looking at port output 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp004.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:49152, vCores:6>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp012.saifg.rbc.com:45454,numContainers=7,capability=<memory:204800, vCores:39>used=<memory:65536, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp011.saifg.rbc.com:45454,numContainers=5,capability=<memory:204800, vCores:39>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp010.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:36864, vCores:5>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp009.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1048)) - Asking RM for containers: [Capability[<memory:4096, vCores:1>]Priority[0], Capability[<memory:4096, vCores:1>]Priority[1], Capability[<memory:4096, vCores:1>]Priority[2], Capability[<memory:4096, vCores:1>]Priority[3]]
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[0]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[1]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[2]
2016-06-17 15:43:20,186 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[3]
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp006.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp004.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp012.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp009.saifg.rbc.com:45454
2016-06-17 15:43:21,215 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:execute(822)) - Got new container., containerId=container_e35_1465495186350_2632_01_000002, containerNode=guedlpdhdp006.saifg.rbc.com:45454, containerNodeURI=guedlpdhdp006.saifg.rbc.com:8042, containerResourceMemory4096, priority0
2016-06-17 15:43:21,226 INFO  delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating password for identifier: owner=mukkamula, renewer=, realUser=, issueDate=1466192601226, maxDate=4611687484619989129, sequenceNumber=1, masterKeyId=2, currentKey: 2
2016-06-17 15:43:21,228 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:run(143)) - Setting up container launch context for containerid=container_e35_1465495186350_2632_01_000002
2016-06-17 15:43:21,233 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:setClasspath(118)) - CLASSPATH: ./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
2016-06-17 15:43:21,323 INFO  util.BasicContainerOptConfigurator (BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for operator {Generic=null, -Xmx=768m}
2016-06-17 15:43:21,323 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts  -Xmx939524096  for container container_e35_1465495186350_2632_01_000002
2016


Regards,
Surya Vamshi

From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 18 12:03 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please check in the Apex application master log (container 1), how much memory it is requesting. If that's the correct figure and still you end up with a larger container, the problem could be the minimum container size in the YARN scheduler configuration.


On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

I tried that option of adding the memory properties in site/conf and selected during the launch , but no luck. The same is working with my local sandbox set up.

Is there any other way that I can understand the reason?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 17 3:06 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please take a look at the section entitled "Properties source precedence" at
http://docs.datatorrent.com/application_packages/

It looks like the setting in dt-site.xml on the cluster is overriding your application defined values.
If you add the properties to file under site/conf in your application and then
select it during launch, those values should take effect.

For signalling EOF, another option is to use a separate control port to send the EOF which could
just be the string "EOF" for example.


Ram

On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I found a way for the 2nd question.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)



2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.(Solution: It seems readEntity() method if returns null emitTuple() method is not getting called, I have managed to emit the null object from readEntity() itself)


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com<ma...@rbc.com>]
Sent: 2016, June, 17 12:20 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.
_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Thomas,

Below is my populate DAG code.

@ApplicationAnnotation(name="ClientGffCreation")
public class ClientGffApplication implements StreamingApplication
{

  @Override
  public void populateDAG(DAG dag, Configuration conf)
  {
    // create operators
    FileReader reader = dag.addOperator("read",  FileReader.class);
    FileOutputOperator writer = dag.addOperator("write", FileOutputOperator.class);

    reader.setScanner(new FileReaderMultiDir.SlicedDirectoryScanner());
    // create DAG
    dag.addStream("File-Writer", reader.output, writer.input);
  }
}

Regards,
Surya Vamshi
From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 20 10:04 AM
To: users@apex.apache.org
Subject: RE: Multiple directories

Hi Thomas,

Though I requested for 500MB, Master Log shows that it is requesting for 4GB container. My properties are being ignored. Please suggest.

2016-06-17 15:43:19,174 INFO  security.StramWSFilter (StramWSFilter.java:doFilter(157)) - 10.60.39.21: proxy access to URI /ws/v2/stram/info by user dr.who, no cookie created
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp005.saifg.rbc.com:45454,numContainers=7,capability=<memory:245760, vCores:40>used=<memory:73728, vCores:11>state=RUNNING
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp008.saifg.rbc.com:45454,numContainers=6,capability=<memory:245760, vCores:40>used=<memory:81920, vCores:11>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator write 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp006.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:57344, vCores:8>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator read 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp007.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:86016, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(620)) - DEBUG: looking at port output 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp004.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:49152, vCores:6>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp012.saifg.rbc.com:45454,numContainers=7,capability=<memory:204800, vCores:39>used=<memory:65536, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp011.saifg.rbc.com:45454,numContainers=5,capability=<memory:204800, vCores:39>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp010.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:36864, vCores:5>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp009.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1048)) - Asking RM for containers: [Capability[<memory:4096, vCores:1>]Priority[0], Capability[<memory:4096, vCores:1>]Priority[1], Capability[<memory:4096, vCores:1>]Priority[2], Capability[<memory:4096, vCores:1>]Priority[3]]
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[0]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[1]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[2]
2016-06-17 15:43:20,186 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[3]
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp006.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp004.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp012.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp009.saifg.rbc.com:45454
2016-06-17 15:43:21,215 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:execute(822)) - Got new container., containerId=container_e35_1465495186350_2632_01_000002, containerNode=guedlpdhdp006.saifg.rbc.com:45454, containerNodeURI=guedlpdhdp006.saifg.rbc.com:8042, containerResourceMemory4096, priority0
2016-06-17 15:43:21,226 INFO  delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating password for identifier: owner=mukkamula, renewer=, realUser=, issueDate=1466192601226, maxDate=4611687484619989129, sequenceNumber=1, masterKeyId=2, currentKey: 2
2016-06-17 15:43:21,228 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:run(143)) - Setting up container launch context for containerid=container_e35_1465495186350_2632_01_000002
2016-06-17 15:43:21,233 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:setClasspath(118)) - CLASSPATH: ./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
2016-06-17 15:43:21,323 INFO  util.BasicContainerOptConfigurator (BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for operator {Generic=null, -Xmx=768m}
2016-06-17 15:43:21,323 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts  -Xmx939524096  for container container_e35_1465495186350_2632_01_000002
2016


Regards,
Surya Vamshi

From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 18 12:03 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please check in the Apex application master log (container 1), how much memory it is requesting. If that's the correct figure and still you end up with a larger container, the problem could be the minimum container size in the YARN scheduler configuration.


On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

I tried that option of adding the memory properties in site/conf and selected during the launch , but no luck. The same is working with my local sandbox set up.

Is there any other way that I can understand the reason?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 17 3:06 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please take a look at the section entitled "Properties source precedence" at
http://docs.datatorrent.com/application_packages/

It looks like the setting in dt-site.xml on the cluster is overriding your application defined values.
If you add the properties to file under site/conf in your application and then
select it during launch, those values should take effect.

For signalling EOF, another option is to use a separate control port to send the EOF which could
just be the string "EOF" for example.


Ram

On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I found a way for the 2nd question.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)



2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.(Solution: It seems readEntity() method if returns null emitTuple() method is not getting called, I have managed to emit the null object from readEntity() itself)


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com<ma...@rbc.com>]
Sent: 2016, June, 17 12:20 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.
_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Thomas,

Though I requested for 500MB, Master Log shows that it is requesting for 4GB container. My properties are being ignored. Please suggest.

2016-06-17 15:43:19,174 INFO  security.StramWSFilter (StramWSFilter.java:doFilter(157)) - 10.60.39.21: proxy access to URI /ws/v2/stram/info by user dr.who, no cookie created
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp005.saifg.rbc.com:45454,numContainers=7,capability=<memory:245760, vCores:40>used=<memory:73728, vCores:11>state=RUNNING
2016-06-17 15:43:19,176 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp008.saifg.rbc.com:45454,numContainers=6,capability=<memory:245760, vCores:40>used=<memory:81920, vCores:11>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator write 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp006.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:57344, vCores:8>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(593)) - DEBUG: looking at operator read 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp007.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:86016, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 WARN  stram.StreamingContainerManager (StreamingContainerManager.java:getAppDataSources(620)) - DEBUG: looking at port output 130
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp004.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:49152, vCores:6>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp012.saifg.rbc.com:45454,numContainers=7,capability=<memory:204800, vCores:39>used=<memory:65536, vCores:10>state=RUNNING
2016-06-17 15:43:19,177 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp011.saifg.rbc.com:45454,numContainers=5,capability=<memory:204800, vCores:39>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp010.saifg.rbc.com:45454,numContainers=3,capability=<memory:245760, vCores:40>used=<memory:36864, vCores:5>state=RUNNING
2016-06-17 15:43:19,178 INFO  stram.ResourceRequestHandler (ResourceRequestHandler.java:updateNodeReports(105)) - Node report: rackName=/default-rack,nodeid=guedlpdhdp009.saifg.rbc.com:45454,numContainers=5,capability=<memory:245760, vCores:40>used=<memory:77824, vCores:9>state=RUNNING
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1048)) - Asking RM for containers: [Capability[<memory:4096, vCores:1>]Priority[0], Capability[<memory:4096, vCores:1>]Priority[1], Capability[<memory:4096, vCores:1>]Priority[2], Capability[<memory:4096, vCores:1>]Priority[3]]
2016-06-17 15:43:20,182 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[0]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[1]
2016-06-17 15:43:20,185 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[2]
2016-06-17 15:43:20,186 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:sendContainerAskToRM(1050)) - Requested container: Capability[<memory:4096, vCores:1>]Priority[3]
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp006.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp004.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp012.saifg.rbc.com:45454
2016-06-17 15:43:21,213 INFO  impl.AMRMClientImpl (AMRMClientImpl.java:populateNMTokens(360)) - Received new token for : guedlpdhdp009.saifg.rbc.com:45454
2016-06-17 15:43:21,215 INFO  stram.StreamingAppMasterService (StreamingAppMasterService.java:execute(822)) - Got new container., containerId=container_e35_1465495186350_2632_01_000002, containerNode=guedlpdhdp006.saifg.rbc.com:45454, containerNodeURI=guedlpdhdp006.saifg.rbc.com:8042, containerResourceMemory4096, priority0
2016-06-17 15:43:21,226 INFO  delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating password for identifier: owner=mukkamula, renewer=, realUser=, issueDate=1466192601226, maxDate=4611687484619989129, sequenceNumber=1, masterKeyId=2, currentKey: 2
2016-06-17 15:43:21,228 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:run(143)) - Setting up container launch context for containerid=container_e35_1465495186350_2632_01_000002
2016-06-17 15:43:21,233 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:setClasspath(118)) - CLASSPATH: ./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:.
2016-06-17 15:43:21,323 INFO  util.BasicContainerOptConfigurator (BasicContainerOptConfigurator.java:getJVMOptions(65)) - property map for operator {Generic=null, -Xmx=768m}
2016-06-17 15:43:21,323 INFO  stram.LaunchContainerRunnable (LaunchContainerRunnable.java:getChildVMCommand(243)) - Jvm opts  -Xmx939524096  for container container_e35_1465495186350_2632_01_000002
2016


Regards,
Surya Vamshi

From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 18 12:03 AM
To: users@apex.apache.org
Subject: Re: Multiple directories

Please check in the Apex application master log (container 1), how much memory it is requesting. If that's the correct figure and still you end up with a larger container, the problem could be the minimum container size in the YARN scheduler configuration.


On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

I tried that option of adding the memory properties in site/conf and selected during the launch , but no luck. The same is working with my local sandbox set up.

Is there any other way that I can understand the reason?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 17 3:06 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please take a look at the section entitled "Properties source precedence" at
http://docs.datatorrent.com/application_packages/

It looks like the setting in dt-site.xml on the cluster is overriding your application defined values.
If you add the properties to file under site/conf in your application and then
select it during launch, those values should take effect.

For signalling EOF, another option is to use a separate control port to send the EOF which could
just be the string "EOF" for example.


Ram

On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I found a way for the 2nd question.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)



2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.(Solution: It seems readEntity() method if returns null emitTuple() method is not getting called, I have managed to emit the null object from readEntity() itself)


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com<ma...@rbc.com>]
Sent: 2016, June, 17 12:20 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by Thomas Weise <th...@gmail.com>.
Please check in the Apex application master log (container 1), how much
memory it is requesting. If that's the correct figure and still you end up
with a larger container, the problem could be the minimum container size in
the YARN scheduler configuration.


On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi Ram,
>
>
>
> I tried that option of adding the memory properties in site/conf and
> selected during the launch , but no luck. The same is working with my local
> sandbox set up.
>
>
>
> Is there any other way that I can understand the reason?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 17 3:06 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Please take a look at the section entitled "Properties source precedence"
> at
>
> http://docs.datatorrent.com/application_packages/
>
>
>
> It looks like the setting in dt-site.xml on the cluster is overriding your
> application defined values.
>
> If you add the properties to file under site/conf in your application and
> then
>
> select it during launch, those values should take effect.
>
>
>
> For signalling EOF, another option is to use a separate control port to
> send the EOF which could
>
> just be the string "EOF" for example.
>
>
>
>
>
> Ram
>
>
>
> On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I found a way for the 2nd question.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
>
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.(Solution: It seems readEntity() method if returns null
> emitTuple() method is not getting called, I have managed to emit the null
> object from readEntity() itself)
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> suryavamshivardhan.mukkamula@rbc.com]
> *Sent:* 2016, June, 17 12:20 PM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi,
>
>
>
> Can you please help me understand the below issues.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.
>
>
>
> ################### File Reader
> ####################################################
>
>
>
> @Override
>
>                 protected String readEntity() throws IOException {
>
>                                 // try to read a line
>
>                                 final String line = br.readLine();
>
>                                 if (null != line) { // normal case
>
>                                                 LOG.debug("readEntity:
> line = {}", line);
>
>                                                 return line;
>
>                                 }
>
>
>
>                                 // end-of-file (control tuple sent in
> closeFile()
>
>                                 LOG.info("readEntity: EOF for {}",
> filePath);
>
>                                 return null;
>
>                 }
>
>
>
>                 @Override
>
>                 protected void emit(String line) {
>
>                                 // parsing logic here, parse the line as
> per the input configuration and
>
>                                 // create the output line as per the
> output configuration
>
>                                 if(line == null){
>
>                                                 output.emit(new
> KeyValue<String,String>(getFileName(),null));
>
>                                 }
>
>                                 KeyValue<String, String> tuple = new
> KeyValue<String, String>();
>
>                                 tuple.key = getFileName();
>
>                                 tuple.value = line;
>
>                                 KeyValue<String, String> newTuple =
> parseTuple(tuple);
>
>                                 output.emit(newTuple);
>
>                 }
>
> ######################File
> Writer######################################################
>
> public class FileOutputOperator extends
> AbstractFileOutputOperator<KeyValue<String, String>> {
>
>     private static final Logger LOG =
> LoggerFactory.getLogger(FileOutputOperator.class);
>
>     private List<String> filesToFinalize = new ArrayList<>();
>
>
>
>     @Override
>
>     public void setup(Context.OperatorContext context) {
>
>         super.setup(context);
>
>         finalizeFiles();
>
>     }
>
>
>
>     @Override
>
>     protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
>
>         if (tuple.value == null) {
>
>                 LOG.info("File to finalize {}",tuple.key);
>
>             filesToFinalize.add(tuple.key);
>
>             return new byte[0];
>
>         }
>
>         else {
>
>             return tuple.value.getBytes();
>
>         }
>
>     }
>
>
>
>     @Override
>
>     protected String getFileName(KeyValue<String, String> tuple) {
>
>         return tuple.key;
>
>     }
>
>
>
>     @Override
>
>     public void endWindow() {
>
>         super.endWindow();
>
>         finalizeFiles();
>
>     }
>
>
>
>     private void finalizeFiles() {
>
>                 LOG.info("Files to finalize {}",filesToFinalize.toArray());
>
>         Iterator<String> fileIt = filesToFinalize.iterator();
>
>         while(fileIt.hasNext()) {
>
>             requestFinalize(fileIt.next());
>
>             fileIt.remove();
>
>         }
>
>     }
>
> }
>
>
> ##################################################################################################
>
>
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [
> mailto:suryavamshivardhan.mukkamula@rbc.com
> <su...@rbc.com>]
> *Sent:* 2016, June, 17 9:11 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi Ram/Raja,
>
>
>
> Hbase dependency was creating older Hadoop jars in my classpath. I removed
> the Hbase dependency which I don’t need for now and issue got resolved.
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Raja.Aravapalli [mailto:Raja.Aravapalli@target.com
> <Ra...@target.com>]
> *Sent:* 2016, June, 17 7:06 AM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> I also, faced similar problem with Hadoop jars, when I used HBase jars in
> pom.xml!! It could be because of version of hadoop jars your apa is holding
> different than the one’s in the cluster!!
>
>
>
>
>
> What I did to solve this is,
>
>
>
> Included the scope provided in Maven pom.xml for hbase jars, and then
> provided the hbase jars to application package during submission using
>  "-libjars” with launch command. Which solved my Invalid Container Id
> problem!!
>
>
>
> You can type “launch help” for learning on usage details.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *Munagala Ramanath <ra...@datatorrent.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Thursday, June 16, 2016 at 4:10 PM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Subject: *Re: Multiple directories
>
>
>
> Those 6 hadoop jars are definitely a problem.
>
>
>
> I didn't see the output of "*mvn dependency:tree*"; could you post that ?
>
> It will show you why these hadoop jars are being pulled in.
>
>
>
> Also, please refer to the section "Hadoop dependencies conflicts" in the
> troubleshooting guide:
>
> http://docs.datatorrent.com/troubleshooting/
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below are the details.
>
>
>
>
>
> 0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
>
>    358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
>
>      0 Wed Jun 15 16:34:24 EDT 2016 app/
>
> 52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
>
>      0 Wed Jun 15 16:34:22 EDT 2016 lib/
>
> 62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
>
> 1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
>
>   4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
>
> 44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
>
> 691479 Wed Jun 15 16:34:22 EDT 2016
> lib/apacheds-kerberos-codec-2.0.0-M15.jar
>
> 16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
>
> 79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
>
> 43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
>
> 303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
>
> 232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
>
> 41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
>
> 284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
>
> 575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
>
> 30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
>
> 241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
>
> 298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
>
> 143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
>
> 112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
>
> 305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
>
> 185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
>
> 284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
>
> 315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
>
> 61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
>
> 1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
>
> 273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
>
> 3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
>
> 313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
>
> 17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
>
> 15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
>
> 84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
>
> 20220 Wed Jun 15 16:34:22 EDT 2016
> lib/geronimo-j2ee-management_1.1_spec-1.0.1
>
> jar
>
> 32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
>
> 21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
>
> 690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
>
> 253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
>
> 198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
>
> 336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
>
>   8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
>
> 1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
>
> 710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
>
> 65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
>
> 16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
>
> 52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
>
> 2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
>
> 1498368 Wed Jun 15 16:34:22 EDT 2016
> lib/hadoop-mapreduce-client-core-2.5.1.jar
>
> 1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
>
> 1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
>
> 50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
>
> 20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
>
> 1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
>
> 530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
>
> 4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
>
> 1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
>
> 590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
>
> 282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
>
> 228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
>
> 765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
>
>   2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
>
> 521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
>
> 83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
>
> 85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
>
> 105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
>
> 890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
>
> 1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
>
> 151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
>
> 130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
>
> 458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
>
> 17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
>
> 14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
>
> 147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
>
> 713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
>
> 28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
>
> 12976 Wed Jun 15 16:34:22 EDT 2016
> lib/jersey-test-framework-grizzly2-1.9.jar
>
> 67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
>
> 21144 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-continuation-8.1.10.v20130312.jar
>
> 95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
>
> 103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
>
> 89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
>
> 347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
>
> 101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
>
> 177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
>
> 284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
>
> 125928 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-websocket-8.1.10.v20130312.jar
>
> 39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
>
> 187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
>
> 280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
>
> 33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
>
> 489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
>
> 565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
>
> 1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
>
> 42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
>
>   8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
>
> 1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
>
> 1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
>
> 29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
>
> 1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
>
> 936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
>
> 4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
>
> 533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
>
> 26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
>
>   9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
>
> 995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
>
> 23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
>
> 26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
>
>   2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
>
> 991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
>
> 758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
>
> 109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
>
> 2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
>
> 15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
>
> 94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
>
> 792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
>
>      0 Wed Jun 15 16:34:28 EDT 2016 conf/
>
>    334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
>
>   3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml
>
>
>
> <?xmlversion=*"1.0"*encoding=*"UTF-8"*?>
>
> <projectxmlns=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>"*xmlns:xsi=*"http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
>
>        xsi:schemaLocation=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>http://maven.apache.org/xsd/maven-4.0.0.xsd
> <http://maven.apache.org/xsd/maven-4.0.0.xsd>"*>
>
>        <modelVersion>4.0.0</modelVersion>
>
>        <groupId>com.rbc.aml.cnscan</groupId>
>
>        <version>1.0-SNAPSHOT</version>
>
>        <artifactId>*countrynamescan*</artifactId>
>
>        <packaging>jar</packaging>
>
>
>
>        <!-- change these to the appropriate values -->
>
>        <name>*countrynamescan*</name>
>
>        <description>Country and Name Scan project</description>
>
>
>
>        <properties>
>
>               <!-- change this if you desire to use a different version
> of DataTorrent -->
>
>               <datatorrent.version>3.1.1</datatorrent.version>
>
>               <datatorrent.apppackage.classpath>lib/*.jar</
> datatorrent.apppackage.classpath>
>
>        </properties>
>
>
>
>        <!-- repository to provide the DataTorrent artifacts -->
>
>        <!-- <repositories>
>
>               <repository>
>
>                      <snapshots>
>
>                            <enabled>false</enabled>
>
>                      </snapshots>
>
>                      <id>*Datatorrent*-Releases</id>
>
>                      <name>DataTorrent Release Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/releases/</url>
>
>               </repository>
>
>               <repository>
>
>                      <releases>
>
>                            <enabled>false</enabled>
>
>                      </releases>
>
>                      <id>DataTorrent-Snapshots</id>
>
>                      <name>DataTorrent Early Access Program Snapshot
> Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
>
>               </repository>
>
>        </repositories> -->
>
>
>
>
>
>        <build>
>
>               <plugins>
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-eclipse-*plugin*</
> artifactId>
>
>                            <version>2.9</version>
>
>                            <configuration>
>
>                                   <downloadSources>true</downloadSources>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-compiler-*plugin*</
> artifactId>
>
>                            <version>3.3</version>
>
>                            <configuration>
>
>                                   <encoding>UTF-8</encoding>
>
>                                   <source>1.7</source>
>
>                                   <target>1.7</target>
>
>                                   <debug>true</debug>
>
>                                   <optimize>false</optimize>
>
>                                   <showDeprecation>true</showDeprecation>
>
>                                   <showWarnings>true</showWarnings>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-dependency-*plugin*</
> artifactId>
>
>                            <version>2.8</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-dependencies</id>
>
>                                          <phase>prepare-package</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-dependencies</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> target/deps</outputDirectory>
>
>                                                 <includeScope>runtime</
> includeScope>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-assembly-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>*app*-package-assembly</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>single</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <finalName>
> ${project.artifactId}-${project.version}-apexapp</finalName>
>
>                                                 <appendAssemblyId>false</
> appendAssemblyId>
>
>                                                 <descriptors>
>
>                                                        <descriptor>
> src/assemble/appPackage.xml</descriptor>
>
>                                                 </descriptors>
>
>                                                 <archiverConfig>
>
>                                                        <
> defaultDirectoryMode>0755</defaultDirectoryMode>
>
>                                                 </archiverConfig>
>
>                                                 <archive>
>
>                                                        <manifestEntries>
>
>                                                               <Class-Path>
> ${datatorrent.apppackage.classpath}</Class-Path>
>
>                                                               <
> DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
>
>                                                               <
> DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
>
>                                                               <
> DT-App-Package-Version>${project.version}</DT-App-Package-Version>
>
>                                                               <
> DT-App-Package-Display-Name>${project.name}</DT-App-Package-Display-Name>
>
>                                                               <
> DT-App-Package-Description>${project.description}</
> DT-App-Package-Description>
>
>                                                       </manifestEntries>
>
>                                                 </archive>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-*antrun*-*plugin*</
> artifactId>
>
>                            <version>1.7</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <phase>package</phase>
>
>                                          <configuration>
>
>                                                 <target>
>
>                                                        <move
>
>                                                               file=
> *"${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"*
>
>                                                               tofile=
> *"${project.build.directory}/${project.artifactId}-${project.version}.apa"*
> />
>
>                                                 </target>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                                   <execution>
>
>                                          <!-- create resource directory
> for *xml* *javadoc* -->
>
>                                          <id>createJavadocDirectory</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <configuration>
>
>                                                 <tasks>
>
>                                                        <delete
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                        <mkdir
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                 </tasks>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>build-helper-*maven*-*plugin*</
> artifactId>
>
>                            <version>1.9.1</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>attach-artifacts</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>attach-artifact</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <artifacts>
>
>                                                        <artifact>
>
>                                                               <file>
> target/${project.artifactId}-${project.version}.apa</file>
>
>                                                               <type>apa</
> type>
>
>                                                        </artifact>
>
>                                                 </artifacts>
>
>                                                 <skipAttach>false</
> skipAttach>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <!-- generate *javdoc* -->
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-*javadoc*-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <!-- generate *xml* *javadoc* -->
>
>                                   <execution>
>
>                                          <id>*xml*-*doclet*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>*javadoc*</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <doclet>
> com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
>
>                                                 <additionalparam>-d
>
>
> ${project.build.directory}/generated-resources/xml-javadoc
>
>                                                        -filename
> ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
>
>                                                 <useStandardDocletOptions>
> false</useStandardDocletOptions>
>
>                                                 <docletArtifact>
>
>                                                        <groupId>
> com.github.markusbernhardt</groupId>
>
>                                                        <artifactId>
> xml-doclet</artifactId>
>
>                                                        <version>1.0.4</
> version>
>
>                                                 </docletArtifact>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>                      <!-- Transform *xml* *javadoc* to stripped down
> version containing only class/interface
>
>                            comments and tags -->
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>*xml*-*maven*-*plugin*</artifactId>
>
>                            <version>1.0</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>transform-*xmljavadoc*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>transform</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                            <configuration>
>
>                                   <transformationSets>
>
>                                          <transformationSet>
>
>                                                 <dir>
> ${project.build.directory}/generated-resources/xml-javadoc</dir>
>
>                                                 <includes>
>
>                                                        <include>
> ${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                 </includes>
>
>                                                 <stylesheet>
> XmlJavadocCommentsExtractor.xsl</stylesheet>
>
>                                                 <outputDir>
> ${project.build.directory}/generated-resources/xml-javadoc</outputDir>
>
>                                          </transformationSet>
>
>                                   </transformationSets>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <!-- copy *xml* *javadoc* to class jar -->
>
>                      <plugin>
>
>                            <artifactId>*maven*-resources-*plugin*</
> artifactId>
>
>                            <version>2.6</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-resources</id>
>
>                                          <phase>process-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-resources</goal
> >
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> ${basedir}/target/classes</outputDirectory>
>
>                                                 <resources>
>
>                                                        <resource>
>
>                                                               <directory>
> ${project.build.directory}/generated-resources/xml-javadoc</directory>
>
>                                                               <includes>
>
>                                                                      <
> include>${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                               </includes>
>
>                                                               <filtering>
> true</filtering>
>
>                                                        </resource>
>
>                                                 </resources>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>               </plugins>
>
>
>
>               <pluginManagement>
>
>                      <plugins>
>
>                            <!--This plugin's configuration is used to
> store Eclipse m2e settings
>
>                                   only. It has no influence on the *Maven*
> build itself. -->
>
>                            <plugin>
>
>                                   <groupId>org.eclipse.m2e</groupId>
>
>                                   <artifactId>*lifecycle*-mapping</
> artifactId>
>
>                                   <version>1.0.0</version>
>
>                                   <configuration>
>
>                                          <lifecycleMappingMetadata>
>
>                                                 <pluginExecutions>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId>org.codehaus.mojo</groupId>
>
>                                                                      <
> artifactId>
>
>
> xml-maven-plugin
>
>                                                                      </
> artifactId>
>
>                                                                      <
> versionRange>[1.0,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal>transform</goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId></groupId>
>
>                                                                      <
> artifactId></artifactId>
>
>                                                                      <
> versionRange>[,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal></goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                 </pluginExecutions>
>
>                                          </lifecycleMappingMetadata>
>
>                                   </configuration>
>
>                            </plugin>
>
>                      </plugins>
>
>               </pluginManagement>
>
>        </build>
>
>
>
>        <dependencies>
>
>               <!-- add your dependencies here -->
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-common</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>        <groupId>com.datatorrent</groupId>
>
>        <artifactId>*dt*-engine</artifactId>
>
>        <version>${datatorrent.version}</version>
>
>        <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.datatorrent</groupId>
>
>               <artifactId>*dt*-common</artifactId>
>
>               <version>${datatorrent.version}</version>
>
>               <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>terajdbc4</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>*tdgssconfig*</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.ibm.db2</groupId>
>
>               <artifactId>db2jcc</artifactId>
>
>               <version>123</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>jdk.tools</groupId>
>
>                      <artifactId>jdk.tools</artifactId>
>
>                      <version>1.7</version>
>
>                      <scope>system</scope>
>
>                      <systemPath>C:/Program Files/Java/jdk1.7.0_79/*lib*
> /tools.jar</systemPath>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.apache.apex</groupId>
>
>                      <artifactId>*malhar*-*contrib*</artifactId>
>
>                      <version>3.2.0-incubating</version>
>
>                      <!--<scope>provided</scope> -->
>
>                      <exclusions>
>
>                            <exclusion>
>
>                                   <groupId>*</groupId>
>
>                                   <artifactId>*</artifactId>
>
>                            </exclusion>
>
>                      </exclusions>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>*junit*</groupId>
>
>                      <artifactId>*junit*</artifactId>
>
>                      <version>4.10</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.vertica</groupId>
>
>                      <artifactId>*vertica*-*jdbc*</artifactId>
>
>                      <version>7.2.1-0</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>org.apache.hbase</groupId>
>
>               <artifactId>*hbase*-client</artifactId>
>
>               <version>1.1.2</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.slf4j</groupId>
>
>                      <artifactId>slf4j-log4j12</artifactId>
>
>                      <version>1.7.19</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-engine</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>net.sf.flatpack</groupId>
>
>                      <artifactId>*flatpack*</artifactId>
>
>                      <version>3.4.2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.jdom</groupId>
>
>                      <artifactId>*jdom*</artifactId>
>
>                      <version>1.1.3</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.xmlbeans</groupId>
>
>                      <artifactId>*xmlbeans*</artifactId>
>
>                      <version>2.3.0</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>dom4j</groupId>
>
>                      <artifactId>dom4j</artifactId>
>
>                      <version>1.6.1</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>javax.xml.stream</groupId>
>
>                      <artifactId>*stax*-*api*</artifactId>
>
>                      <version>1.0-2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*-schemas</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>        </dependencies>
>
>
>
> </project>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 4:37 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> It looks like you may be including old Hadoop jars in your apa package
> since the stack trace
>
> shows *ConverterUtils.toContainerId* calling
> *ConverterUtils.toApplicationAttemptId* but recent versions
>
> don't have that call sequence. In 2.7.1 (which is what your cluster has)
> the function looks like this:
>
>  * public static ContainerId toContainerId(String containerIdStr) {*
>
> *    return ContainerId.fromString(containerIdStr);*
>
> *  }*
>
>
>
> Could you post the output of "*jar tvf {your-apa-file}*" as well as: "*mvn
> dependency:tree"*
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below is the information.
>
>
>
>
>
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>
>                                  Dload  Upload   Total   Spent    Left
> Speed
>
> 100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--
> 3807
>
> {
>
>     "clusterInfo": {
>
>         "haState": "ACTIVE",
>
>         "haZooKeeperConnectionState": "CONNECTED",
>
>         "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa98205f18bc
>
> caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
>
>         "hadoopVersion": "2.7.1.2.3.2.0-2950",
>
>         "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
>
>         "id": 1465495186350,
>
>         "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa9
>
> 8205f18bccaeaf36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
>         "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>        "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
>         "rmStateStoreName":
> "org.apache.hadoop.yarn.server.resourcemanager.recov
>
> ery.ZKRMStateStore",
>
>         "startedOn": 1465495186350,
>
>         "state": "STARTED"
>
>     }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 2:57 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Can you ssh to one of the cluster nodes ? If so, can you run this command
> and show the output
>
> (where *{rm} *is the *host:port* running the resource manager, aka YARN):
>
>
>
> *curl http://{rm}/ws/v1/cluster <http://%7brm%7d/ws/v1/cluster> | python
> -mjson.tool*
>
>
>
> Ram
>
> ps. You can determine the node running YARN with:
>
>
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.address*
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*
>
>
>
>
>
>
>
> On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> èWe have 120 directories to scan on HDFS, if we use parallel partitioning
> with operator memory around 250MB , it might be around 30GB of RAM for the
> processing of this operator, are these figures going to create any problem
> in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> èShould I use a scheduler for running the batch job (or) define next scan
> time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram,

Below are the properties that I am using in properties.xml

<?xml version="1.0"?>
<configuration>

  <property>
   <name>dt.application.ClientGffCreation.operator.read.attr.MEMORY_MB</name>
   <value>500</value>
  </property>

  <property>
   <name>dt.application.ClientGffCreation.operator.write.attr.MEMORY_MB</name>
   <value>500</value>
  </property>

  <property>
    <name>dt.application.ClientGffCreation.operator.*.port.*.attr.BUFFER_MEMORY_MB</name>
    <value>128</value>
  </property>

  <property>
   <name>dt.application.ClientGffCreation.stream.data.locality</name>
   <value>CONTAINER_LOCAL</value>
  </property>
  <property>
   <name>dt.application.ClientGffCreation.stream.control.locality</name>
   <value>CONTAINER_LOCAL</value>
  </property>

  <property>
    <name>dt.loggers.level</name>
    <value>com.datatorrent.*:INFO,org.apache.*:INFO</value>
  </property>

  <!--
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.directory</name>
    <value>/tmp/fileIO/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.write.prop.filePath</name>
    <value>/tmp/fileIO/output</value>
  </property>
   -->

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.directory</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.write.prop.filePath</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>

  <!-- Source : 804 properties
   <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(804)</name>
    <value>tmp/fileIO/804/input</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(804)</name>
    <value>tmp/fileIO/804/config/804_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(804)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(804)</name>
    <value>tmp/fileIO/804/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(804)</name>
    <value>tmp/fileIO/804/config/804_output_config.xml</value>
  </property>
  -->

   <!-- Source : 805 properties
   <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(805)</name>
    <value>tmp/fileIO/805/input</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(805)</name>
    <value>tmp/fileIO/805/config/805_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(805)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(805)</name>
    <value>tmp/fileIO/805/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(805)</name>
    <value>tmp/fileIO/805/config/805_output_config.xml</value>
  </property>
  -->

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(804)</name>
    <value>/dev/up20/data/dm/804/2016/05/16</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(804)</name>
    <value>/user/mukkamula/cnscan/config/804/804_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(804)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(804)</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(804)</name>
    <value>/user/mukkamula/cnscan/config/804/804_output_config.xml</value>
  </property>

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputDirectory(600)</name>
    <value>/dev/up20/data/dm/600/2016/05/20</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.inputConfigFile(600)</name>
    <value>/user/mukkamula/cnscan/config/600/600_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.partCount(600)</name>
    <value>1</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputDirectory(600)</name>
    <value>/user/mukkamula/cnscan/output</value>
  </property>
  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.outputConfigFile(600)</name>
    <value>/user/mukkamula/cnscan/config/600/600_output_config.xml</value>
  </property>

  <property>
    <name>dt.application.ClientGffCreation.operator.read.prop.fileNamePattern</name>
    <value>$[source]_$[yyyy]_$[MM]_$[dd].dat</value>
    </property>
</configuration>

###################################################################################################################################


Properties from site/conf/my-app-conf1.xml , which I am using at the launch time.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
  <property>
    <name>dt.attr.MASTER_MEMORY_MB</name>
    <value>1024</value>
  </property>
  <property>
    <name>dt.application.MyFirstApplication.operator.randomGenerator.prop.numTuples</name>
    <value>1000</value>
  </property>
  <property>
       <name>dt.attr.MASTER_MEMORY_MB</name>
       <value>500</value>
       </property>
       <property>
       <name>dt.application.*.operator.*.attr.MEMORY_MB</name>
       <value>500</value>
       </property>
</configuration>


Regards,
Surya Vamshi


From: Munagala Ramanath [mailto:ram@datatorrent.com]
Sent: 2016, June, 17 4:58 PM
To: users@apex.apache.org
Subject: Re: Multiple directories

Can you share your properties file ?

Ram

On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

I tried that option of adding the memory properties in site/conf and selected during the launch , but no luck. The same is working with my local sandbox set up.

Is there any other way that I can understand the reason?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 17 3:06 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Please take a look at the section entitled "Properties source precedence" at
http://docs.datatorrent.com/application_packages/

It looks like the setting in dt-site.xml on the cluster is overriding your application defined values.
If you add the properties to file under site/conf in your application and then
select it during launch, those values should take effect.

For signalling EOF, another option is to use a separate control port to send the EOF which could
just be the string "EOF" for example.


Ram

On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I found a way for the 2nd question.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)



2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.(Solution: It seems readEntity() method if returns null emitTuple() method is not getting called, I have managed to emit the null object from readEntity() itself)


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com<ma...@rbc.com>]
Sent: 2016, June, 17 12:20 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Can you share your properties file ?

Ram

On Fri, Jun 17, 2016 at 12:58 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi Ram,
>
>
>
> I tried that option of adding the memory properties in site/conf and
> selected during the launch , but no luck. The same is working with my local
> sandbox set up.
>
>
>
> Is there any other way that I can understand the reason?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 17 3:06 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Please take a look at the section entitled "Properties source precedence"
> at
>
> http://docs.datatorrent.com/application_packages/
>
>
>
> It looks like the setting in dt-site.xml on the cluster is overriding your
> application defined values.
>
> If you add the properties to file under site/conf in your application and
> then
>
> select it during launch, those values should take effect.
>
>
>
> For signalling EOF, another option is to use a separate control port to
> send the EOF which could
>
> just be the string "EOF" for example.
>
>
>
>
>
> Ram
>
>
>
> On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I found a way for the 2nd question.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
>
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.(Solution: It seems readEntity() method if returns null
> emitTuple() method is not getting called, I have managed to emit the null
> object from readEntity() itself)
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> suryavamshivardhan.mukkamula@rbc.com]
> *Sent:* 2016, June, 17 12:20 PM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi,
>
>
>
> Can you please help me understand the below issues.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.
>
>
>
> ################### File Reader
> ####################################################
>
>
>
> @Override
>
>                 protected String readEntity() throws IOException {
>
>                                 // try to read a line
>
>                                 final String line = br.readLine();
>
>                                 if (null != line) { // normal case
>
>                                                 LOG.debug("readEntity:
> line = {}", line);
>
>                                                 return line;
>
>                                 }
>
>
>
>                                 // end-of-file (control tuple sent in
> closeFile()
>
>                                 LOG.info("readEntity: EOF for {}",
> filePath);
>
>                                 return null;
>
>                 }
>
>
>
>                 @Override
>
>                 protected void emit(String line) {
>
>                                 // parsing logic here, parse the line as
> per the input configuration and
>
>                                 // create the output line as per the
> output configuration
>
>                                 if(line == null){
>
>                                                 output.emit(new
> KeyValue<String,String>(getFileName(),null));
>
>                                 }
>
>                                 KeyValue<String, String> tuple = new
> KeyValue<String, String>();
>
>                                 tuple.key = getFileName();
>
>                                 tuple.value = line;
>
>                                 KeyValue<String, String> newTuple =
> parseTuple(tuple);
>
>                                 output.emit(newTuple);
>
>                 }
>
> ######################File
> Writer######################################################
>
> public class FileOutputOperator extends
> AbstractFileOutputOperator<KeyValue<String, String>> {
>
>     private static final Logger LOG =
> LoggerFactory.getLogger(FileOutputOperator.class);
>
>     private List<String> filesToFinalize = new ArrayList<>();
>
>
>
>     @Override
>
>     public void setup(Context.OperatorContext context) {
>
>         super.setup(context);
>
>         finalizeFiles();
>
>     }
>
>
>
>     @Override
>
>     protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
>
>         if (tuple.value == null) {
>
>                 LOG.info("File to finalize {}",tuple.key);
>
>             filesToFinalize.add(tuple.key);
>
>             return new byte[0];
>
>         }
>
>         else {
>
>             return tuple.value.getBytes();
>
>         }
>
>     }
>
>
>
>     @Override
>
>     protected String getFileName(KeyValue<String, String> tuple) {
>
>         return tuple.key;
>
>     }
>
>
>
>     @Override
>
>     public void endWindow() {
>
>         super.endWindow();
>
>         finalizeFiles();
>
>     }
>
>
>
>     private void finalizeFiles() {
>
>                 LOG.info("Files to finalize {}",filesToFinalize.toArray());
>
>         Iterator<String> fileIt = filesToFinalize.iterator();
>
>         while(fileIt.hasNext()) {
>
>             requestFinalize(fileIt.next());
>
>             fileIt.remove();
>
>         }
>
>     }
>
> }
>
>
> ##################################################################################################
>
>
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [
> mailto:suryavamshivardhan.mukkamula@rbc.com
> <su...@rbc.com>]
> *Sent:* 2016, June, 17 9:11 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi Ram/Raja,
>
>
>
> Hbase dependency was creating older Hadoop jars in my classpath. I removed
> the Hbase dependency which I don’t need for now and issue got resolved.
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Raja.Aravapalli [mailto:Raja.Aravapalli@target.com
> <Ra...@target.com>]
> *Sent:* 2016, June, 17 7:06 AM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> I also, faced similar problem with Hadoop jars, when I used HBase jars in
> pom.xml!! It could be because of version of hadoop jars your apa is holding
> different than the one’s in the cluster!!
>
>
>
>
>
> What I did to solve this is,
>
>
>
> Included the scope provided in Maven pom.xml for hbase jars, and then
> provided the hbase jars to application package during submission using
>  "-libjars” with launch command. Which solved my Invalid Container Id
> problem!!
>
>
>
> You can type “launch help” for learning on usage details.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *Munagala Ramanath <ra...@datatorrent.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Thursday, June 16, 2016 at 4:10 PM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Subject: *Re: Multiple directories
>
>
>
> Those 6 hadoop jars are definitely a problem.
>
>
>
> I didn't see the output of "*mvn dependency:tree*"; could you post that ?
>
> It will show you why these hadoop jars are being pulled in.
>
>
>
> Also, please refer to the section "Hadoop dependencies conflicts" in the
> troubleshooting guide:
>
> http://docs.datatorrent.com/troubleshooting/
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below are the details.
>
>
>
>
>
> 0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
>
>    358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
>
>      0 Wed Jun 15 16:34:24 EDT 2016 app/
>
> 52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
>
>      0 Wed Jun 15 16:34:22 EDT 2016 lib/
>
> 62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
>
> 1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
>
>   4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
>
> 44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
>
> 691479 Wed Jun 15 16:34:22 EDT 2016
> lib/apacheds-kerberos-codec-2.0.0-M15.jar
>
> 16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
>
> 79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
>
> 43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
>
> 303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
>
> 232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
>
> 41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
>
> 284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
>
> 575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
>
> 30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
>
> 241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
>
> 298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
>
> 143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
>
> 112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
>
> 305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
>
> 185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
>
> 284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
>
> 315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
>
> 61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
>
> 1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
>
> 273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
>
> 3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
>
> 313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
>
> 17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
>
> 15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
>
> 84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
>
> 20220 Wed Jun 15 16:34:22 EDT 2016
> lib/geronimo-j2ee-management_1.1_spec-1.0.1
>
> jar
>
> 32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
>
> 21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
>
> 690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
>
> 253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
>
> 198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
>
> 336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
>
>   8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
>
> 1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
>
> 710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
>
> 65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
>
> 16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
>
> 52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
>
> 2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
>
> 1498368 Wed Jun 15 16:34:22 EDT 2016
> lib/hadoop-mapreduce-client-core-2.5.1.jar
>
> 1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
>
> 1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
>
> 50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
>
> 20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
>
> 1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
>
> 530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
>
> 4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
>
> 1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
>
> 590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
>
> 282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
>
> 228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
>
> 765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
>
>   2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
>
> 521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
>
> 83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
>
> 85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
>
> 105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
>
> 890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
>
> 1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
>
> 151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
>
> 130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
>
> 458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
>
> 17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
>
> 14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
>
> 147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
>
> 713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
>
> 28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
>
> 12976 Wed Jun 15 16:34:22 EDT 2016
> lib/jersey-test-framework-grizzly2-1.9.jar
>
> 67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
>
> 21144 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-continuation-8.1.10.v20130312.jar
>
> 95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
>
> 103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
>
> 89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
>
> 347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
>
> 101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
>
> 177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
>
> 284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
>
> 125928 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-websocket-8.1.10.v20130312.jar
>
> 39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
>
> 187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
>
> 280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
>
> 33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
>
> 489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
>
> 565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
>
> 1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
>
> 42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
>
>   8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
>
> 1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
>
> 1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
>
> 29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
>
> 1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
>
> 936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
>
> 4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
>
> 533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
>
> 26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
>
>   9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
>
> 995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
>
> 23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
>
> 26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
>
>   2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
>
> 991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
>
> 758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
>
> 109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
>
> 2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
>
> 15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
>
> 94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
>
> 792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
>
>      0 Wed Jun 15 16:34:28 EDT 2016 conf/
>
>    334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
>
>   3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml
>
>
>
> <?xmlversion=*"1.0"*encoding=*"UTF-8"*?>
>
> <projectxmlns=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>"*xmlns:xsi=*"http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
>
>        xsi:schemaLocation=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>http://maven.apache.org/xsd/maven-4.0.0.xsd
> <http://maven.apache.org/xsd/maven-4.0.0.xsd>"*>
>
>        <modelVersion>4.0.0</modelVersion>
>
>        <groupId>com.rbc.aml.cnscan</groupId>
>
>        <version>1.0-SNAPSHOT</version>
>
>        <artifactId>*countrynamescan*</artifactId>
>
>        <packaging>jar</packaging>
>
>
>
>        <!-- change these to the appropriate values -->
>
>        <name>*countrynamescan*</name>
>
>        <description>Country and Name Scan project</description>
>
>
>
>        <properties>
>
>               <!-- change this if you desire to use a different version
> of DataTorrent -->
>
>               <datatorrent.version>3.1.1</datatorrent.version>
>
>               <datatorrent.apppackage.classpath>lib/*.jar</
> datatorrent.apppackage.classpath>
>
>        </properties>
>
>
>
>        <!-- repository to provide the DataTorrent artifacts -->
>
>        <!-- <repositories>
>
>               <repository>
>
>                      <snapshots>
>
>                            <enabled>false</enabled>
>
>                      </snapshots>
>
>                      <id>*Datatorrent*-Releases</id>
>
>                      <name>DataTorrent Release Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/releases/</url>
>
>               </repository>
>
>               <repository>
>
>                      <releases>
>
>                            <enabled>false</enabled>
>
>                      </releases>
>
>                      <id>DataTorrent-Snapshots</id>
>
>                      <name>DataTorrent Early Access Program Snapshot
> Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
>
>               </repository>
>
>        </repositories> -->
>
>
>
>
>
>        <build>
>
>               <plugins>
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-eclipse-*plugin*</
> artifactId>
>
>                            <version>2.9</version>
>
>                            <configuration>
>
>                                   <downloadSources>true</downloadSources>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-compiler-*plugin*</
> artifactId>
>
>                            <version>3.3</version>
>
>                            <configuration>
>
>                                   <encoding>UTF-8</encoding>
>
>                                   <source>1.7</source>
>
>                                   <target>1.7</target>
>
>                                   <debug>true</debug>
>
>                                   <optimize>false</optimize>
>
>                                   <showDeprecation>true</showDeprecation>
>
>                                   <showWarnings>true</showWarnings>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-dependency-*plugin*</
> artifactId>
>
>                            <version>2.8</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-dependencies</id>
>
>                                          <phase>prepare-package</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-dependencies</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> target/deps</outputDirectory>
>
>                                                 <includeScope>runtime</
> includeScope>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-assembly-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>*app*-package-assembly</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>single</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <finalName>
> ${project.artifactId}-${project.version}-apexapp</finalName>
>
>                                                 <appendAssemblyId>false</
> appendAssemblyId>
>
>                                                 <descriptors>
>
>                                                        <descriptor>
> src/assemble/appPackage.xml</descriptor>
>
>                                                 </descriptors>
>
>                                                 <archiverConfig>
>
>                                                        <
> defaultDirectoryMode>0755</defaultDirectoryMode>
>
>                                                 </archiverConfig>
>
>                                                 <archive>
>
>                                                        <manifestEntries>
>
>                                                               <Class-Path>
> ${datatorrent.apppackage.classpath}</Class-Path>
>
>                                                               <
> DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
>
>                                                               <
> DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
>
>                                                               <
> DT-App-Package-Version>${project.version}</DT-App-Package-Version>
>
>                                                               <
> DT-App-Package-Display-Name>${project.name}</DT-App-Package-Display-Name>
>
>                                                               <
> DT-App-Package-Description>${project.description}</
> DT-App-Package-Description>
>
>                                                       </manifestEntries>
>
>                                                 </archive>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-*antrun*-*plugin*</
> artifactId>
>
>                            <version>1.7</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <phase>package</phase>
>
>                                          <configuration>
>
>                                                 <target>
>
>                                                        <move
>
>                                                               file=
> *"${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"*
>
>                                                               tofile=
> *"${project.build.directory}/${project.artifactId}-${project.version}.apa"*
> />
>
>                                                 </target>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                                   <execution>
>
>                                          <!-- create resource directory
> for *xml* *javadoc* -->
>
>                                          <id>createJavadocDirectory</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <configuration>
>
>                                                 <tasks>
>
>                                                        <delete
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                        <mkdir
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                 </tasks>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>build-helper-*maven*-*plugin*</
> artifactId>
>
>                            <version>1.9.1</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>attach-artifacts</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>attach-artifact</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <artifacts>
>
>                                                        <artifact>
>
>                                                               <file>
> target/${project.artifactId}-${project.version}.apa</file>
>
>                                                               <type>apa</
> type>
>
>                                                        </artifact>
>
>                                                 </artifacts>
>
>                                                 <skipAttach>false</
> skipAttach>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <!-- generate *javdoc* -->
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-*javadoc*-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <!-- generate *xml* *javadoc* -->
>
>                                   <execution>
>
>                                          <id>*xml*-*doclet*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>*javadoc*</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <doclet>
> com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
>
>                                                 <additionalparam>-d
>
>
> ${project.build.directory}/generated-resources/xml-javadoc
>
>                                                        -filename
> ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
>
>                                                 <useStandardDocletOptions>
> false</useStandardDocletOptions>
>
>                                                 <docletArtifact>
>
>                                                        <groupId>
> com.github.markusbernhardt</groupId>
>
>                                                        <artifactId>
> xml-doclet</artifactId>
>
>                                                        <version>1.0.4</
> version>
>
>                                                 </docletArtifact>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>                      <!-- Transform *xml* *javadoc* to stripped down
> version containing only class/interface
>
>                            comments and tags -->
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>*xml*-*maven*-*plugin*</artifactId>
>
>                            <version>1.0</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>transform-*xmljavadoc*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>transform</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                            <configuration>
>
>                                   <transformationSets>
>
>                                          <transformationSet>
>
>                                                 <dir>
> ${project.build.directory}/generated-resources/xml-javadoc</dir>
>
>                                                 <includes>
>
>                                                        <include>
> ${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                 </includes>
>
>                                                 <stylesheet>
> XmlJavadocCommentsExtractor.xsl</stylesheet>
>
>                                                 <outputDir>
> ${project.build.directory}/generated-resources/xml-javadoc</outputDir>
>
>                                          </transformationSet>
>
>                                   </transformationSets>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <!-- copy *xml* *javadoc* to class jar -->
>
>                      <plugin>
>
>                            <artifactId>*maven*-resources-*plugin*</
> artifactId>
>
>                            <version>2.6</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-resources</id>
>
>                                          <phase>process-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-resources</goal
> >
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> ${basedir}/target/classes</outputDirectory>
>
>                                                 <resources>
>
>                                                        <resource>
>
>                                                               <directory>
> ${project.build.directory}/generated-resources/xml-javadoc</directory>
>
>                                                               <includes>
>
>                                                                      <
> include>${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                               </includes>
>
>                                                               <filtering>
> true</filtering>
>
>                                                        </resource>
>
>                                                 </resources>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>               </plugins>
>
>
>
>               <pluginManagement>
>
>                      <plugins>
>
>                            <!--This plugin's configuration is used to
> store Eclipse m2e settings
>
>                                   only. It has no influence on the *Maven*
> build itself. -->
>
>                            <plugin>
>
>                                   <groupId>org.eclipse.m2e</groupId>
>
>                                   <artifactId>*lifecycle*-mapping</
> artifactId>
>
>                                   <version>1.0.0</version>
>
>                                   <configuration>
>
>                                          <lifecycleMappingMetadata>
>
>                                                 <pluginExecutions>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId>org.codehaus.mojo</groupId>
>
>                                                                      <
> artifactId>
>
>
> xml-maven-plugin
>
>                                                                      </
> artifactId>
>
>                                                                      <
> versionRange>[1.0,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal>transform</goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId></groupId>
>
>                                                                      <
> artifactId></artifactId>
>
>                                                                      <
> versionRange>[,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal></goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                 </pluginExecutions>
>
>                                          </lifecycleMappingMetadata>
>
>                                   </configuration>
>
>                            </plugin>
>
>                      </plugins>
>
>               </pluginManagement>
>
>        </build>
>
>
>
>        <dependencies>
>
>               <!-- add your dependencies here -->
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-common</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>        <groupId>com.datatorrent</groupId>
>
>        <artifactId>*dt*-engine</artifactId>
>
>        <version>${datatorrent.version}</version>
>
>        <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.datatorrent</groupId>
>
>               <artifactId>*dt*-common</artifactId>
>
>               <version>${datatorrent.version}</version>
>
>               <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>terajdbc4</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>*tdgssconfig*</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.ibm.db2</groupId>
>
>               <artifactId>db2jcc</artifactId>
>
>               <version>123</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>jdk.tools</groupId>
>
>                      <artifactId>jdk.tools</artifactId>
>
>                      <version>1.7</version>
>
>                      <scope>system</scope>
>
>                      <systemPath>C:/Program Files/Java/jdk1.7.0_79/*lib*
> /tools.jar</systemPath>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.apache.apex</groupId>
>
>                      <artifactId>*malhar*-*contrib*</artifactId>
>
>                      <version>3.2.0-incubating</version>
>
>                      <!--<scope>provided</scope> -->
>
>                      <exclusions>
>
>                            <exclusion>
>
>                                   <groupId>*</groupId>
>
>                                   <artifactId>*</artifactId>
>
>                            </exclusion>
>
>                      </exclusions>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>*junit*</groupId>
>
>                      <artifactId>*junit*</artifactId>
>
>                      <version>4.10</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.vertica</groupId>
>
>                      <artifactId>*vertica*-*jdbc*</artifactId>
>
>                      <version>7.2.1-0</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>org.apache.hbase</groupId>
>
>               <artifactId>*hbase*-client</artifactId>
>
>               <version>1.1.2</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.slf4j</groupId>
>
>                      <artifactId>slf4j-log4j12</artifactId>
>
>                      <version>1.7.19</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-engine</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>net.sf.flatpack</groupId>
>
>                      <artifactId>*flatpack*</artifactId>
>
>                      <version>3.4.2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.jdom</groupId>
>
>                      <artifactId>*jdom*</artifactId>
>
>                      <version>1.1.3</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.xmlbeans</groupId>
>
>                      <artifactId>*xmlbeans*</artifactId>
>
>                      <version>2.3.0</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>dom4j</groupId>
>
>                      <artifactId>dom4j</artifactId>
>
>                      <version>1.6.1</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>javax.xml.stream</groupId>
>
>                      <artifactId>*stax*-*api*</artifactId>
>
>                      <version>1.0-2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*-schemas</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>        </dependencies>
>
>
>
> </project>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 4:37 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> It looks like you may be including old Hadoop jars in your apa package
> since the stack trace
>
> shows *ConverterUtils.toContainerId* calling
> *ConverterUtils.toApplicationAttemptId* but recent versions
>
> don't have that call sequence. In 2.7.1 (which is what your cluster has)
> the function looks like this:
>
>  * public static ContainerId toContainerId(String containerIdStr) {*
>
> *    return ContainerId.fromString(containerIdStr);*
>
> *  }*
>
>
>
> Could you post the output of "*jar tvf {your-apa-file}*" as well as: "*mvn
> dependency:tree"*
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below is the information.
>
>
>
>
>
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>
>                                  Dload  Upload   Total   Spent    Left
> Speed
>
> 100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--
> 3807
>
> {
>
>     "clusterInfo": {
>
>         "haState": "ACTIVE",
>
>         "haZooKeeperConnectionState": "CONNECTED",
>
>         "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa98205f18bc
>
> caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
>
>         "hadoopVersion": "2.7.1.2.3.2.0-2950",
>
>         "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
>
>         "id": 1465495186350,
>
>         "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa9
>
> 8205f18bccaeaf36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
>         "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>        "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
>         "rmStateStoreName":
> "org.apache.hadoop.yarn.server.resourcemanager.recov
>
> ery.ZKRMStateStore",
>
>         "startedOn": 1465495186350,
>
>         "state": "STARTED"
>
>     }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 2:57 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Can you ssh to one of the cluster nodes ? If so, can you run this command
> and show the output
>
> (where *{rm} *is the *host:port* running the resource manager, aka YARN):
>
>
>
> *curl http://{rm}/ws/v1/cluster <http://%7brm%7d/ws/v1/cluster> | python
> -mjson.tool*
>
>
>
> Ram
>
> ps. You can determine the node running YARN with:
>
>
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.address*
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*
>
>
>
>
>
>
>
> On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> èWe have 120 directories to scan on HDFS, if we use parallel partitioning
> with operator memory around 250MB , it might be around 30GB of RAM for the
> processing of this operator, are these figures going to create any problem
> in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> èShould I use a scheduler for running the batch job (or) define next scan
> time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram,

I tried that option of adding the memory properties in site/conf and selected during the launch , but no luck. The same is working with my local sandbox set up.

Is there any other way that I can understand the reason?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com]
Sent: 2016, June, 17 3:06 PM
To: users@apex.apache.org
Subject: Re: Multiple directories

Please take a look at the section entitled "Properties source precedence" at
http://docs.datatorrent.com/application_packages/

It looks like the setting in dt-site.xml on the cluster is overriding your application defined values.
If you add the properties to file under site/conf in your application and then
select it during launch, those values should take effect.

For signalling EOF, another option is to use a separate control port to send the EOF which could
just be the string "EOF" for example.


Ram

On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I found a way for the 2nd question.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)



2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.(Solution: It seems readEntity() method if returns null emitTuple() method is not getting called, I have managed to emit the null object from readEntity() itself)


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com<ma...@rbc.com>]
Sent: 2016, June, 17 12:20 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Please take a look at the section entitled "Properties source precedence" at
http://docs.datatorrent.com/application_packages/

It looks like the setting in dt-site.xml on the cluster is overriding your
application defined values.
If you add the properties to file under site/conf in your application and
then
select it during launch, those values should take effect.

For signalling EOF, another option is to use a separate control port to
send the EOF which could
just be the string "EOF" for example.


Ram

On Fri, Jun 17, 2016 at 11:48 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi,
>
>
>
> I found a way for the 2nd question.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
>
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.(Solution: It seems readEntity() method if returns null
> emitTuple() method is not getting called, I have managed to emit the null
> object from readEntity() itself)
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> suryavamshivardhan.mukkamula@rbc.com]
> *Sent:* 2016, June, 17 12:20 PM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi,
>
>
>
> Can you please help me understand the below issues.
>
>
>
> 1)      I tried setting operator memory to 500MB but still the operator
> is taking 4GB by default , I did not understand why? (In my sandbox set up
> memory was set correctly to 500MB, but not on enterprise dev cluster)
>
> 2)      I am trying to send a null object from file reader when EOF is
> reached so that file writer can call requestFinalize() Method. But
> somehow I could not figure out how to send the EOF, I tried like below but
> no luck.
>
>
>
> ################### File Reader
> ####################################################
>
>
>
> @Override
>
>                 protected String readEntity() throws IOException {
>
>                                 // try to read a line
>
>                                 final String line = br.readLine();
>
>                                 if (null != line) { // normal case
>
>                                                 LOG.debug("readEntity:
> line = {}", line);
>
>                                                 return line;
>
>                                 }
>
>
>
>                                 // end-of-file (control tuple sent in
> closeFile()
>
>                                 LOG.info("readEntity: EOF for {}",
> filePath);
>
>                                 return null;
>
>                 }
>
>
>
>                 @Override
>
>                 protected void emit(String line) {
>
>                                 // parsing logic here, parse the line as
> per the input configuration and
>
>                                 // create the output line as per the
> output configuration
>
>                                 if(line == null){
>
>                                                 output.emit(new
> KeyValue<String,String>(getFileName(),null));
>
>                                 }
>
>                                 KeyValue<String, String> tuple = new
> KeyValue<String, String>();
>
>                                 tuple.key = getFileName();
>
>                                 tuple.value = line;
>
>                                 KeyValue<String, String> newTuple =
> parseTuple(tuple);
>
>                                 output.emit(newTuple);
>
>                 }
>
> ######################File
> Writer######################################################
>
> public class FileOutputOperator extends
> AbstractFileOutputOperator<KeyValue<String, String>> {
>
>     private static final Logger LOG =
> LoggerFactory.getLogger(FileOutputOperator.class);
>
>     private List<String> filesToFinalize = new ArrayList<>();
>
>
>
>     @Override
>
>     public void setup(Context.OperatorContext context) {
>
>         super.setup(context);
>
>         finalizeFiles();
>
>     }
>
>
>
>     @Override
>
>     protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
>
>         if (tuple.value == null) {
>
>                 LOG.info("File to finalize {}",tuple.key);
>
>             filesToFinalize.add(tuple.key);
>
>             return new byte[0];
>
>         }
>
>         else {
>
>             return tuple.value.getBytes();
>
>         }
>
>     }
>
>
>
>     @Override
>
>     protected String getFileName(KeyValue<String, String> tuple) {
>
>         return tuple.key;
>
>     }
>
>
>
>     @Override
>
>     public void endWindow() {
>
>         super.endWindow();
>
>         finalizeFiles();
>
>     }
>
>
>
>     private void finalizeFiles() {
>
>                 LOG.info("Files to finalize {}",filesToFinalize.toArray());
>
>         Iterator<String> fileIt = filesToFinalize.iterator();
>
>         while(fileIt.hasNext()) {
>
>             requestFinalize(fileIt.next());
>
>             fileIt.remove();
>
>         }
>
>     }
>
> }
>
>
> ##################################################################################################
>
>
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [
> mailto:suryavamshivardhan.mukkamula@rbc.com
> <su...@rbc.com>]
> *Sent:* 2016, June, 17 9:11 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Hi Ram/Raja,
>
>
>
> Hbase dependency was creating older Hadoop jars in my classpath. I removed
> the Hbase dependency which I don’t need for now and issue got resolved.
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Raja.Aravapalli [mailto:Raja.Aravapalli@target.com
> <Ra...@target.com>]
> *Sent:* 2016, June, 17 7:06 AM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> I also, faced similar problem with Hadoop jars, when I used HBase jars in
> pom.xml!! It could be because of version of hadoop jars your apa is holding
> different than the one’s in the cluster!!
>
>
>
>
>
> What I did to solve this is,
>
>
>
> Included the scope provided in Maven pom.xml for hbase jars, and then
> provided the hbase jars to application package during submission using
>  "-libjars” with launch command. Which solved my Invalid Container Id
> problem!!
>
>
>
> You can type “launch help” for learning on usage details.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *Munagala Ramanath <ra...@datatorrent.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Thursday, June 16, 2016 at 4:10 PM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Subject: *Re: Multiple directories
>
>
>
> Those 6 hadoop jars are definitely a problem.
>
>
>
> I didn't see the output of "*mvn dependency:tree*"; could you post that ?
>
> It will show you why these hadoop jars are being pulled in.
>
>
>
> Also, please refer to the section "Hadoop dependencies conflicts" in the
> troubleshooting guide:
>
> http://docs.datatorrent.com/troubleshooting/
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below are the details.
>
>
>
>
>
> 0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
>
>    358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
>
>      0 Wed Jun 15 16:34:24 EDT 2016 app/
>
> 52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
>
>      0 Wed Jun 15 16:34:22 EDT 2016 lib/
>
> 62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
>
> 1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
>
>   4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
>
> 44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
>
> 691479 Wed Jun 15 16:34:22 EDT 2016
> lib/apacheds-kerberos-codec-2.0.0-M15.jar
>
> 16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
>
> 79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
>
> 43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
>
> 303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
>
> 232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
>
> 41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
>
> 284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
>
> 575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
>
> 30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
>
> 241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
>
> 298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
>
> 143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
>
> 112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
>
> 305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
>
> 185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
>
> 284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
>
> 315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
>
> 61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
>
> 1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
>
> 273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
>
> 3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
>
> 313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
>
> 17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
>
> 15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
>
> 84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
>
> 20220 Wed Jun 15 16:34:22 EDT 2016
> lib/geronimo-j2ee-management_1.1_spec-1.0.1
>
> jar
>
> 32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
>
> 21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
>
> 690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
>
> 253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
>
> 198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
>
> 336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
>
>   8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
>
> 1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
>
> 710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
>
> 65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
>
> 16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
>
> 52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
>
> 2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
>
> 1498368 Wed Jun 15 16:34:22 EDT 2016
> lib/hadoop-mapreduce-client-core-2.5.1.jar
>
> 1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
>
> 1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
>
> 50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
>
> 20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
>
> 1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
>
> 530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
>
> 4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
>
> 1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
>
> 590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
>
> 282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
>
> 228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
>
> 765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
>
>   2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
>
> 521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
>
> 83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
>
> 85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
>
> 105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
>
> 890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
>
> 1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
>
> 151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
>
> 130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
>
> 458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
>
> 17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
>
> 14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
>
> 147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
>
> 713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
>
> 28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
>
> 12976 Wed Jun 15 16:34:22 EDT 2016
> lib/jersey-test-framework-grizzly2-1.9.jar
>
> 67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
>
> 21144 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-continuation-8.1.10.v20130312.jar
>
> 95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
>
> 103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
>
> 89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
>
> 347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
>
> 101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
>
> 177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
>
> 284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
>
> 125928 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-websocket-8.1.10.v20130312.jar
>
> 39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
>
> 187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
>
> 280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
>
> 33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
>
> 489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
>
> 565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
>
> 1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
>
> 42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
>
>   8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
>
> 1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
>
> 1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
>
> 29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
>
> 1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
>
> 936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
>
> 4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
>
> 533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
>
> 26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
>
>   9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
>
> 995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
>
> 23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
>
> 26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
>
>   2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
>
> 991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
>
> 758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
>
> 109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
>
> 2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
>
> 15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
>
> 94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
>
> 792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
>
>      0 Wed Jun 15 16:34:28 EDT 2016 conf/
>
>    334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
>
>   3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml
>
>
>
> <?xmlversion=*"1.0"*encoding=*"UTF-8"*?>
>
> <projectxmlns=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>"*xmlns:xsi=*"http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
>
>        xsi:schemaLocation=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>http://maven.apache.org/xsd/maven-4.0.0.xsd
> <http://maven.apache.org/xsd/maven-4.0.0.xsd>"*>
>
>        <modelVersion>4.0.0</modelVersion>
>
>        <groupId>com.rbc.aml.cnscan</groupId>
>
>        <version>1.0-SNAPSHOT</version>
>
>        <artifactId>*countrynamescan*</artifactId>
>
>        <packaging>jar</packaging>
>
>
>
>        <!-- change these to the appropriate values -->
>
>        <name>*countrynamescan*</name>
>
>        <description>Country and Name Scan project</description>
>
>
>
>        <properties>
>
>               <!-- change this if you desire to use a different version
> of DataTorrent -->
>
>               <datatorrent.version>3.1.1</datatorrent.version>
>
>               <datatorrent.apppackage.classpath>lib/*.jar</
> datatorrent.apppackage.classpath>
>
>        </properties>
>
>
>
>        <!-- repository to provide the DataTorrent artifacts -->
>
>        <!-- <repositories>
>
>               <repository>
>
>                      <snapshots>
>
>                            <enabled>false</enabled>
>
>                      </snapshots>
>
>                      <id>*Datatorrent*-Releases</id>
>
>                      <name>DataTorrent Release Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/releases/</url>
>
>               </repository>
>
>               <repository>
>
>                      <releases>
>
>                            <enabled>false</enabled>
>
>                      </releases>
>
>                      <id>DataTorrent-Snapshots</id>
>
>                      <name>DataTorrent Early Access Program Snapshot
> Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
>
>               </repository>
>
>        </repositories> -->
>
>
>
>
>
>        <build>
>
>               <plugins>
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-eclipse-*plugin*</
> artifactId>
>
>                            <version>2.9</version>
>
>                            <configuration>
>
>                                   <downloadSources>true</downloadSources>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-compiler-*plugin*</
> artifactId>
>
>                            <version>3.3</version>
>
>                            <configuration>
>
>                                   <encoding>UTF-8</encoding>
>
>                                   <source>1.7</source>
>
>                                   <target>1.7</target>
>
>                                   <debug>true</debug>
>
>                                   <optimize>false</optimize>
>
>                                   <showDeprecation>true</showDeprecation>
>
>                                   <showWarnings>true</showWarnings>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-dependency-*plugin*</
> artifactId>
>
>                            <version>2.8</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-dependencies</id>
>
>                                          <phase>prepare-package</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-dependencies</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> target/deps</outputDirectory>
>
>                                                 <includeScope>runtime</
> includeScope>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-assembly-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>*app*-package-assembly</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>single</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <finalName>
> ${project.artifactId}-${project.version}-apexapp</finalName>
>
>                                                 <appendAssemblyId>false</
> appendAssemblyId>
>
>                                                 <descriptors>
>
>                                                        <descriptor>
> src/assemble/appPackage.xml</descriptor>
>
>                                                 </descriptors>
>
>                                                 <archiverConfig>
>
>                                                        <
> defaultDirectoryMode>0755</defaultDirectoryMode>
>
>                                                 </archiverConfig>
>
>                                                 <archive>
>
>                                                        <manifestEntries>
>
>                                                               <Class-Path>
> ${datatorrent.apppackage.classpath}</Class-Path>
>
>                                                               <
> DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
>
>                                                               <
> DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
>
>                                                               <
> DT-App-Package-Version>${project.version}</DT-App-Package-Version>
>
>                                                               <
> DT-App-Package-Display-Name>${project.name}</DT-App-Package-Display-Name>
>
>                                                               <
> DT-App-Package-Description>${project.description}</
> DT-App-Package-Description>
>
>                                                       </manifestEntries>
>
>                                                 </archive>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-*antrun*-*plugin*</
> artifactId>
>
>                            <version>1.7</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <phase>package</phase>
>
>                                          <configuration>
>
>                                                 <target>
>
>                                                        <move
>
>                                                               file=
> *"${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"*
>
>                                                               tofile=
> *"${project.build.directory}/${project.artifactId}-${project.version}.apa"*
> />
>
>                                                 </target>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                                   <execution>
>
>                                          <!-- create resource directory
> for *xml* *javadoc* -->
>
>                                          <id>createJavadocDirectory</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <configuration>
>
>                                                 <tasks>
>
>                                                        <delete
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                        <mkdir
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"*/>
>
>                                                 </tasks>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>build-helper-*maven*-*plugin*</
> artifactId>
>
>                            <version>1.9.1</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>attach-artifacts</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>attach-artifact</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <artifacts>
>
>                                                        <artifact>
>
>                                                               <file>
> target/${project.artifactId}-${project.version}.apa</file>
>
>                                                               <type>apa</
> type>
>
>                                                        </artifact>
>
>                                                 </artifacts>
>
>                                                 <skipAttach>false</
> skipAttach>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <!-- generate *javdoc* -->
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-*javadoc*-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <!-- generate *xml* *javadoc* -->
>
>                                   <execution>
>
>                                          <id>*xml*-*doclet*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>*javadoc*</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <doclet>
> com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
>
>                                                 <additionalparam>-d
>
>
> ${project.build.directory}/generated-resources/xml-javadoc
>
>                                                        -filename
> ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
>
>                                                 <useStandardDocletOptions>
> false</useStandardDocletOptions>
>
>                                                 <docletArtifact>
>
>                                                        <groupId>
> com.github.markusbernhardt</groupId>
>
>                                                        <artifactId>
> xml-doclet</artifactId>
>
>                                                        <version>1.0.4</
> version>
>
>                                                 </docletArtifact>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>                      <!-- Transform *xml* *javadoc* to stripped down
> version containing only class/interface
>
>                            comments and tags -->
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>*xml*-*maven*-*plugin*</artifactId>
>
>                            <version>1.0</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>transform-*xmljavadoc*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>transform</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                            <configuration>
>
>                                   <transformationSets>
>
>                                          <transformationSet>
>
>                                                 <dir>
> ${project.build.directory}/generated-resources/xml-javadoc</dir>
>
>                                                 <includes>
>
>                                                        <include>
> ${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                 </includes>
>
>                                                 <stylesheet>
> XmlJavadocCommentsExtractor.xsl</stylesheet>
>
>                                                 <outputDir>
> ${project.build.directory}/generated-resources/xml-javadoc</outputDir>
>
>                                          </transformationSet>
>
>                                   </transformationSets>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <!-- copy *xml* *javadoc* to class jar -->
>
>                      <plugin>
>
>                            <artifactId>*maven*-resources-*plugin*</
> artifactId>
>
>                            <version>2.6</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-resources</id>
>
>                                          <phase>process-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-resources</goal
> >
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> ${basedir}/target/classes</outputDirectory>
>
>                                                 <resources>
>
>                                                        <resource>
>
>                                                               <directory>
> ${project.build.directory}/generated-resources/xml-javadoc</directory>
>
>                                                               <includes>
>
>                                                                      <
> include>${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                               </includes>
>
>                                                               <filtering>
> true</filtering>
>
>                                                        </resource>
>
>                                                 </resources>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>               </plugins>
>
>
>
>               <pluginManagement>
>
>                      <plugins>
>
>                            <!--This plugin's configuration is used to
> store Eclipse m2e settings
>
>                                   only. It has no influence on the *Maven*
> build itself. -->
>
>                            <plugin>
>
>                                   <groupId>org.eclipse.m2e</groupId>
>
>                                   <artifactId>*lifecycle*-mapping</
> artifactId>
>
>                                   <version>1.0.0</version>
>
>                                   <configuration>
>
>                                          <lifecycleMappingMetadata>
>
>                                                 <pluginExecutions>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId>org.codehaus.mojo</groupId>
>
>                                                                      <
> artifactId>
>
>
> xml-maven-plugin
>
>                                                                      </
> artifactId>
>
>                                                                      <
> versionRange>[1.0,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal>transform</goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId></groupId>
>
>                                                                      <
> artifactId></artifactId>
>
>                                                                      <
> versionRange>[,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal></goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                 </pluginExecutions>
>
>                                          </lifecycleMappingMetadata>
>
>                                   </configuration>
>
>                            </plugin>
>
>                      </plugins>
>
>               </pluginManagement>
>
>        </build>
>
>
>
>        <dependencies>
>
>               <!-- add your dependencies here -->
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-common</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>        <groupId>com.datatorrent</groupId>
>
>        <artifactId>*dt*-engine</artifactId>
>
>        <version>${datatorrent.version}</version>
>
>        <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.datatorrent</groupId>
>
>               <artifactId>*dt*-common</artifactId>
>
>               <version>${datatorrent.version}</version>
>
>               <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>terajdbc4</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>*tdgssconfig*</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.ibm.db2</groupId>
>
>               <artifactId>db2jcc</artifactId>
>
>               <version>123</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>jdk.tools</groupId>
>
>                      <artifactId>jdk.tools</artifactId>
>
>                      <version>1.7</version>
>
>                      <scope>system</scope>
>
>                      <systemPath>C:/Program Files/Java/jdk1.7.0_79/*lib*
> /tools.jar</systemPath>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.apache.apex</groupId>
>
>                      <artifactId>*malhar*-*contrib*</artifactId>
>
>                      <version>3.2.0-incubating</version>
>
>                      <!--<scope>provided</scope> -->
>
>                      <exclusions>
>
>                            <exclusion>
>
>                                   <groupId>*</groupId>
>
>                                   <artifactId>*</artifactId>
>
>                            </exclusion>
>
>                      </exclusions>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>*junit*</groupId>
>
>                      <artifactId>*junit*</artifactId>
>
>                      <version>4.10</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.vertica</groupId>
>
>                      <artifactId>*vertica*-*jdbc*</artifactId>
>
>                      <version>7.2.1-0</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>org.apache.hbase</groupId>
>
>               <artifactId>*hbase*-client</artifactId>
>
>               <version>1.1.2</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.slf4j</groupId>
>
>                      <artifactId>slf4j-log4j12</artifactId>
>
>                      <version>1.7.19</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-engine</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>net.sf.flatpack</groupId>
>
>                      <artifactId>*flatpack*</artifactId>
>
>                      <version>3.4.2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.jdom</groupId>
>
>                      <artifactId>*jdom*</artifactId>
>
>                      <version>1.1.3</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.xmlbeans</groupId>
>
>                      <artifactId>*xmlbeans*</artifactId>
>
>                      <version>2.3.0</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>dom4j</groupId>
>
>                      <artifactId>dom4j</artifactId>
>
>                      <version>1.6.1</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>javax.xml.stream</groupId>
>
>                      <artifactId>*stax*-*api*</artifactId>
>
>                      <version>1.0-2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*-schemas</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>        </dependencies>
>
>
>
> </project>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 4:37 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> It looks like you may be including old Hadoop jars in your apa package
> since the stack trace
>
> shows *ConverterUtils.toContainerId* calling
> *ConverterUtils.toApplicationAttemptId* but recent versions
>
> don't have that call sequence. In 2.7.1 (which is what your cluster has)
> the function looks like this:
>
>  * public static ContainerId toContainerId(String containerIdStr) {*
>
> *    return ContainerId.fromString(containerIdStr);*
>
> *  }*
>
>
>
> Could you post the output of "*jar tvf {your-apa-file}*" as well as: "*mvn
> dependency:tree"*
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below is the information.
>
>
>
>
>
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>
>                                  Dload  Upload   Total   Spent    Left
> Speed
>
> 100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--
> 3807
>
> {
>
>     "clusterInfo": {
>
>         "haState": "ACTIVE",
>
>         "haZooKeeperConnectionState": "CONNECTED",
>
>         "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa98205f18bc
>
> caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
>
>         "hadoopVersion": "2.7.1.2.3.2.0-2950",
>
>         "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
>
>         "id": 1465495186350,
>
>         "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa9
>
> 8205f18bccaeaf36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
>         "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>        "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
>         "rmStateStoreName":
> "org.apache.hadoop.yarn.server.resourcemanager.recov
>
> ery.ZKRMStateStore",
>
>         "startedOn": 1465495186350,
>
>         "state": "STARTED"
>
>     }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 2:57 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Can you ssh to one of the cluster nodes ? If so, can you run this command
> and show the output
>
> (where *{rm} *is the *host:port* running the resource manager, aka YARN):
>
>
>
> *curl http://{rm}/ws/v1/cluster <http://%7brm%7d/ws/v1/cluster> | python
> -mjson.tool*
>
>
>
> Ram
>
> ps. You can determine the node running YARN with:
>
>
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.address*
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*
>
>
>
>
>
>
>
> On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> èWe have 120 directories to scan on HDFS, if we use parallel partitioning
> with operator memory around 250MB , it might be around 30GB of RAM for the
> processing of this operator, are these figures going to create any problem
> in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> èShould I use a scheduler for running the batch job (or) define next scan
> time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi,

I found a way for the 2nd question.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)



2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.(Solution: It seems readEntity() method if returns null emitTuple() method is not getting called, I have managed to emit the null object from readEntity() itself)


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 12:20 PM
To: users@apex.apache.org
Subject: RE: Multiple directories

Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.
_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi,

Can you please help me understand the below issues.


1)      I tried setting operator memory to 500MB but still the operator is taking 4GB by default , I did not understand why? (In my sandbox set up memory was set correctly to 500MB, but not on enterprise dev cluster)

2)      I am trying to send a null object from file reader when EOF is reached so that file writer can call requestFinalize() Method. But somehow I could not figure out how to send the EOF, I tried like below but no luck.

################### File Reader ####################################################

@Override
                protected String readEntity() throws IOException {
                                // try to read a line
                                final String line = br.readLine();
                                if (null != line) { // normal case
                                                LOG.debug("readEntity: line = {}", line);
                                                return line;
                                }

                                // end-of-file (control tuple sent in closeFile()
                                LOG.info("readEntity: EOF for {}", filePath);
                                return null;
                }

                @Override
                protected void emit(String line) {
                                // parsing logic here, parse the line as per the input configuration and
                                // create the output line as per the output configuration
                                if(line == null){
                                                output.emit(new KeyValue<String,String>(getFileName(),null));
                                }
                                KeyValue<String, String> tuple = new KeyValue<String, String>();
                                tuple.key = getFileName();
                                tuple.value = line;
                                KeyValue<String, String> newTuple = parseTuple(tuple);
                                output.emit(newTuple);
                }
######################File Writer######################################################
public class FileOutputOperator extends AbstractFileOutputOperator<KeyValue<String, String>> {
    private static final Logger LOG = LoggerFactory.getLogger(FileOutputOperator.class);
    private List<String> filesToFinalize = new ArrayList<>();

    @Override
    public void setup(Context.OperatorContext context) {
        super.setup(context);
        finalizeFiles();
    }

    @Override
    protected byte[] getBytesForTuple(KeyValue<String, String> tuple) {
        if (tuple.value == null) {
                LOG.info("File to finalize {}",tuple.key);
            filesToFinalize.add(tuple.key);
            return new byte[0];
        }
        else {
            return tuple.value.getBytes();
        }
    }

    @Override
    protected String getFileName(KeyValue<String, String> tuple) {
        return tuple.key;
    }

    @Override
    public void endWindow() {
        super.endWindow();
        finalizeFiles();
    }

    private void finalizeFiles() {
                LOG.info("Files to finalize {}",filesToFinalize.toArray());
        Iterator<String> fileIt = filesToFinalize.iterator();
        while(fileIt.hasNext()) {
            requestFinalize(fileIt.next());
            fileIt.remove();
        }
    }
}
##################################################################################################



Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:suryavamshivardhan.mukkamula@rbc.com]
Sent: 2016, June, 17 9:11 AM
To: users@apex.apache.org
Subject: RE: Multiple directories

Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.
_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram/Raja,

Hbase dependency was creating older Hadoop jars in my classpath. I removed the Hbase dependency which I don’t need for now and issue got resolved.

Thank you for your help.

Regards,
Surya Vamshi

From: Raja.Aravapalli [mailto:Raja.Aravapalli@target.com]
Sent: 2016, June, 17 7:06 AM
To: users@apex.apache.org
Subject: Re: Multiple directories


I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by "Raja.Aravapalli" <Ra...@target.com>.
I also, faced similar problem with Hadoop jars, when I used HBase jars in pom.xml!! It could be because of version of hadoop jars your apa is holding different than the one’s in the cluster!!


What I did to solve this is,

Included the scope provided in Maven pom.xml for hbase jars, and then provided the hbase jars to application package during submission using  "-libjars” with launch command. Which solved my Invalid Container Id problem!!

You can type “launch help” for learning on usage details.


Regards,
Raja.

From: Munagala Ramanath <ra...@datatorrent.com>>
Reply-To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Date: Thursday, June 16, 2016 at 4:10 PM
To: "users@apex.apache.org<ma...@apex.apache.org>" <us...@apex.apache.org>>
Subject: Re: Multiple directories

Those 6 hadoop jars are definitely a problem.

I didn't see the output of "mvn dependency:tree"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name<http://project.name>}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa"/>
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc"/>
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==>We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==>Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


Re: Multiple directories

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Those 6 hadoop jars are definitely a problem.

I didn't see the output of "*mvn dependency:tree*"; could you post that ?
It will show you why these hadoop jars are being pulled in.

Also, please refer to the section "Hadoop dependencies conflicts" in the
troubleshooting guide:
http://docs.datatorrent.com/troubleshooting/

Ram

On Thu, Jun 16, 2016 at 1:56 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi Ram,
>
>
>
> Below are the details.
>
>
>
>
>
> 0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
>
>    358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
>
>      0 Wed Jun 15 16:34:24 EDT 2016 app/
>
> 52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
>
>      0 Wed Jun 15 16:34:22 EDT 2016 lib/
>
> 62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
>
> 1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
>
>   4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
>
> 44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
>
> 691479 Wed Jun 15 16:34:22 EDT 2016
> lib/apacheds-kerberos-codec-2.0.0-M15.jar
>
> 16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
>
> 79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
>
> 43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
>
> 303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
>
> 232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
>
> 41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
>
> 284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
>
> 575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
>
> 30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
>
> 241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
>
> 298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
>
> 143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
>
> 112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
>
> 305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
>
> 185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
>
> 284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
>
> 315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
>
> 61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
>
> 1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
>
> 273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
>
> 3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
>
> 313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
>
> 17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
>
> 15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
>
> 84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
>
> 20220 Wed Jun 15 16:34:22 EDT 2016
> lib/geronimo-j2ee-management_1.1_spec-1.0.1
>
> jar
>
> 32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
>
> 21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
>
> 690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
>
> 253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
>
> 198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
>
> 336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
>
>   8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
>
> 1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
>
> 710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
>
> 65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
>
> 16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
>
> 52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
>
> 2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
>
> 1498368 Wed Jun 15 16:34:22 EDT 2016
> lib/hadoop-mapreduce-client-core-2.5.1.jar
>
> 1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
>
> 1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
>
> 50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
>
> 20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
>
> 1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
>
> 530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
>
> 4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
>
> 1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
>
> 590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
>
> 282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
>
> 228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
>
> 765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
>
>   2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
>
> 521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
>
> 83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
>
> 85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
>
> 105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
>
> 890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
>
> 1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
>
> 151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
>
> 130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
>
> 458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
>
> 17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
>
> 14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
>
> 147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
>
> 713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
>
> 28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
>
> 12976 Wed Jun 15 16:34:22 EDT 2016
> lib/jersey-test-framework-grizzly2-1.9.jar
>
> 67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
>
> 21144 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-continuation-8.1.10.v20130312.jar
>
> 95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
>
> 103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
>
> 89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
>
> 347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
>
> 101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
>
> 177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
>
> 284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
>
> 125928 Wed Jun 15 16:34:22 EDT 2016
> lib/jetty-websocket-8.1.10.v20130312.jar
>
> 39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
>
> 187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
>
> 280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
>
> 33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
>
> 489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
>
> 565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
>
> 1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
>
> 42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
>
>   8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
>
> 1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
>
> 1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
>
> 29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
>
> 1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
>
> 936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
>
> 4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
>
> 533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
>
> 26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
>
>   9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
>
> 995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
>
> 23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
>
> 26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
>
>   2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
>
> 991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
>
> 758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
>
> 109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
>
> 2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
>
> 15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
>
> 94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
>
> 792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
>
>      0 Wed Jun 15 16:34:28 EDT 2016 conf/
>
>    334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
>
>   3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml
>
>
>
> <?xml version=*"1.0"* encoding=*"UTF-8"*?>
>
> <project xmlns=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>"* xmlns:xsi=*"http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
>
>        xsi:schemaLocation=*"http://maven.apache.org/POM/4.0.0
> <http://maven.apache.org/POM/4.0.0>
> http://maven.apache.org/xsd/maven-4.0.0.xsd
> <http://maven.apache.org/xsd/maven-4.0.0.xsd>"*>
>
>        <modelVersion>4.0.0</modelVersion>
>
>        <groupId>com.rbc.aml.cnscan</groupId>
>
>        <version>1.0-SNAPSHOT</version>
>
>        <artifactId>*countrynamescan*</artifactId>
>
>        <packaging>jar</packaging>
>
>
>
>        <!-- change these to the appropriate values -->
>
>        <name>*countrynamescan*</name>
>
>        <description>Country and Name Scan project</description>
>
>
>
>        <properties>
>
>               <!-- change this if you desire to use a different version
> of DataTorrent -->
>
>               <datatorrent.version>3.1.1</datatorrent.version>
>
>               <datatorrent.apppackage.classpath>lib/*.jar</
> datatorrent.apppackage.classpath>
>
>        </properties>
>
>
>
>        <!-- repository to provide the DataTorrent artifacts -->
>
>        <!-- <repositories>
>
>               <repository>
>
>                      <snapshots>
>
>                            <enabled>false</enabled>
>
>                      </snapshots>
>
>                      <id>*Datatorrent*-Releases</id>
>
>                      <name>DataTorrent Release Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/releases/</url>
>
>               </repository>
>
>               <repository>
>
>                      <releases>
>
>                            <enabled>false</enabled>
>
>                      </releases>
>
>                      <id>DataTorrent-Snapshots</id>
>
>                      <name>DataTorrent Early Access Program Snapshot
> Repository</name>
>
>                      *<url>*
> https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
>
>               </repository>
>
>        </repositories> -->
>
>
>
>
>
>        <build>
>
>               <plugins>
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-eclipse-*plugin*</
> artifactId>
>
>                            <version>2.9</version>
>
>                            <configuration>
>
>                                   <downloadSources>true</downloadSources>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-compiler-*plugin*</
> artifactId>
>
>                            <version>3.3</version>
>
>                            <configuration>
>
>                                   <encoding>UTF-8</encoding>
>
>                                   <source>1.7</source>
>
>                                   <target>1.7</target>
>
>                                   <debug>true</debug>
>
>                                   <optimize>false</optimize>
>
>                                   <showDeprecation>true</showDeprecation>
>
>                                   <showWarnings>true</showWarnings>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <plugin>
>
>                            <artifactId>*maven*-dependency-*plugin*</
> artifactId>
>
>                            <version>2.8</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-dependencies</id>
>
>                                          <phase>prepare-package</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-dependencies</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> target/deps</outputDirectory>
>
>                                                 <includeScope>runtime</
> includeScope>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-assembly-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>*app*-package-assembly</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>single</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <finalName>
> ${project.artifactId}-${project.version}-apexapp</finalName>
>
>                                                 <appendAssemblyId>false</
> appendAssemblyId>
>
>                                                 <descriptors>
>
>                                                        <descriptor>
> src/assemble/appPackage.xml</descriptor>
>
>                                                 </descriptors>
>
>                                                 <archiverConfig>
>
>                                                        <
> defaultDirectoryMode>0755</defaultDirectoryMode>
>
>                                                 </archiverConfig>
>
>                                                 <archive>
>
>                                                        <manifestEntries>
>
>                                                               <Class-Path>
> ${datatorrent.apppackage.classpath}</Class-Path>
>
>                                                               <
> DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
>
>                                                               <
> DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
>
>                                                               <
> DT-App-Package-Version>${project.version}</DT-App-Package-Version>
>
>                                                               <
> DT-App-Package-Display-Name>${project.name}</DT-App-Package-Display-Name>
>
>                                                               <
> DT-App-Package-Description>${project.description}</
> DT-App-Package-Description>
>
>                                                       </manifestEntries>
>
>                                                 </archive>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <artifactId>*maven*-*antrun*-*plugin*</
> artifactId>
>
>                            <version>1.7</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <phase>package</phase>
>
>                                          <configuration>
>
>                                                 <target>
>
>                                                        <move
>
>                                                               file=
> *"${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"*
>
>                                                               tofile=
> *"${project.build.directory}/${project.artifactId}-${project.version}.apa"*
> />
>
>                                                 </target>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                                   <execution>
>
>                                          <!-- create resource directory
> for *xml* *javadoc* -->
>
>                                          <id>createJavadocDirectory</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <configuration>
>
>                                                 <tasks>
>
>                                                        <delete
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"* />
>
>                                                        <mkdir
>
>                                                               dir=
> *"${project.build.directory}/generated-resources/xml-javadoc"* />
>
>                                                 </tasks>
>
>                                          </configuration>
>
>                                          <goals>
>
>                                                 <goal>run</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>build-helper-*maven*-*plugin*</
> artifactId>
>
>                            <version>1.9.1</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>attach-artifacts</id>
>
>                                          <phase>package</phase>
>
>                                          <goals>
>
>                                                 <goal>attach-artifact</
> goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <artifacts>
>
>                                                        <artifact>
>
>                                                               <file>
> target/${project.artifactId}-${project.version}.apa</file>
>
>                                                               <type>apa</
> type>
>
>                                                        </artifact>
>
>                                                 </artifacts>
>
>                                                 <skipAttach>false</
> skipAttach>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>                      <!-- generate *javdoc* -->
>
>                      <plugin>
>
>                            <groupId>org.apache.maven.plugins</groupId>
>
>                            <artifactId>*maven*-*javadoc*-*plugin*</
> artifactId>
>
>                            <executions>
>
>                                   <!-- generate *xml* *javadoc* -->
>
>                                   <execution>
>
>                                          <id>*xml*-*doclet*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>*javadoc*</goal>
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <doclet>
> com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
>
>                                                 <additionalparam>-d
>
>
> ${project.build.directory}/generated-resources/xml-javadoc
>
>                                                        -filename
> ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
>
>                                                 <useStandardDocletOptions>
> false</useStandardDocletOptions>
>
>                                                 <docletArtifact>
>
>                                                        <groupId>
> com.github.markusbernhardt</groupId>
>
>                                                        <artifactId>
> xml-doclet</artifactId>
>
>                                                        <version>1.0.4</
> version>
>
>                                                 </docletArtifact>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>                      <!-- Transform *xml* *javadoc* to stripped down
> version containing only class/interface
>
>                            comments and tags -->
>
>                      <plugin>
>
>                            <groupId>org.codehaus.mojo</groupId>
>
>                            <artifactId>*xml*-*maven*-*plugin*</artifactId>
>
>                            <version>1.0</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>transform-*xmljavadoc*</id>
>
>                                          <phase>generate-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>transform</goal>
>
>                                          </goals>
>
>                                   </execution>
>
>                            </executions>
>
>                            <configuration>
>
>                                   <transformationSets>
>
>                                          <transformationSet>
>
>                                                 <dir>
> ${project.build.directory}/generated-resources/xml-javadoc</dir>
>
>                                                 <includes>
>
>                                                        <include>
> ${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                 </includes>
>
>                                                 <stylesheet>
> XmlJavadocCommentsExtractor.xsl</stylesheet>
>
>                                                 <outputDir>
> ${project.build.directory}/generated-resources/xml-javadoc</outputDir>
>
>                                          </transformationSet>
>
>                                   </transformationSets>
>
>                            </configuration>
>
>                      </plugin>
>
>                      <!-- copy *xml* *javadoc* to class jar -->
>
>                      <plugin>
>
>                            <artifactId>*maven*-resources-*plugin*</
> artifactId>
>
>                            <version>2.6</version>
>
>                            <executions>
>
>                                   <execution>
>
>                                          <id>copy-resources</id>
>
>                                          <phase>process-resources</phase>
>
>                                          <goals>
>
>                                                 <goal>copy-resources</goal
> >
>
>                                          </goals>
>
>                                          <configuration>
>
>                                                 <outputDirectory>
> ${basedir}/target/classes</outputDirectory>
>
>                                                 <resources>
>
>                                                        <resource>
>
>                                                               <directory>
> ${project.build.directory}/generated-resources/xml-javadoc</directory>
>
>                                                               <includes>
>
>                                                                      <
> include>${project.artifactId}-${project.version}-javadoc.xml</include>
>
>                                                               </includes>
>
>                                                               <filtering>
> true</filtering>
>
>                                                        </resource>
>
>                                                 </resources>
>
>                                          </configuration>
>
>                                   </execution>
>
>                            </executions>
>
>                      </plugin>
>
>
>
>               </plugins>
>
>
>
>               <pluginManagement>
>
>                      <plugins>
>
>                            <!--This plugin's configuration is used to
> store Eclipse m2e settings
>
>                                   only. It has no influence on the *Maven*
> build itself. -->
>
>                            <plugin>
>
>                                   <groupId>org.eclipse.m2e</groupId>
>
>                                   <artifactId>*lifecycle*-mapping</
> artifactId>
>
>                                   <version>1.0.0</version>
>
>                                   <configuration>
>
>                                          <lifecycleMappingMetadata>
>
>                                                 <pluginExecutions>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId>org.codehaus.mojo</groupId>
>
>                                                                      <
> artifactId>
>
>
> xml-maven-plugin
>
>                                                                      </
> artifactId>
>
>                                                                      <
> versionRange>[1.0,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal>transform</goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                        <pluginExecution>
>
>                                                               <
> pluginExecutionFilter>
>
>                                                                      <
> groupId></groupId>
>
>                                                                      <
> artifactId></artifactId>
>
>                                                                      <
> versionRange>[,)</versionRange>
>
>                                                                      <
> goals>
>
>
> <goal></goal>
>
>                                                                      </
> goals>
>
>                                                               </
> pluginExecutionFilter>
>
>                                                               <action>
>
>                                                                      <
> ignore></ignore>
>
>                                                               </action>
>
>                                                        </pluginExecution>
>
>                                                 </pluginExecutions>
>
>                                          </lifecycleMappingMetadata>
>
>                                   </configuration>
>
>                            </plugin>
>
>                      </plugins>
>
>               </pluginManagement>
>
>        </build>
>
>
>
>        <dependencies>
>
>               <!-- add your dependencies here -->
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-common</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*malhar*-library</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <!-- If you know that your application does not need
> transitive dependencies
>
>                            pulled in by *malhar*-library, uncomment the
> following to reduce the size of
>
>                            your *app* package. -->
>
>                      <!-- <exclusions> <exclusion> <groupId>*</groupId>
> <artifactId>*</artifactId>
>
>                            </exclusion> </exclusions> -->
>
>               </dependency>
>
>               <dependency>
>
>        <groupId>com.datatorrent</groupId>
>
>        <artifactId>*dt*-engine</artifactId>
>
>        <version>${datatorrent.version}</version>
>
>        <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.datatorrent</groupId>
>
>               <artifactId>*dt*-common</artifactId>
>
>               <version>${datatorrent.version}</version>
>
>               <scope>provided</scope>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>terajdbc4</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.teradata.jdbc</groupId>
>
>               <artifactId>*tdgssconfig*</artifactId>
>
>               <version>14.00.00.21</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>com.ibm.db2</groupId>
>
>               <artifactId>db2jcc</artifactId>
>
>               <version>123</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>jdk.tools</groupId>
>
>                      <artifactId>jdk.tools</artifactId>
>
>                      <version>1.7</version>
>
>                      <scope>system</scope>
>
>                      <systemPath>C:/Program Files/Java/jdk1.7.0_79/*lib*
> /tools.jar</systemPath>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.apache.apex</groupId>
>
>                      <artifactId>*malhar*-*contrib*</artifactId>
>
>                      <version>3.2.0-incubating</version>
>
>                      <!--<scope>provided</scope> -->
>
>                      <exclusions>
>
>                            <exclusion>
>
>                                   <groupId>*</groupId>
>
>                                   <artifactId>*</artifactId>
>
>                            </exclusion>
>
>                      </exclusions>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>*junit*</groupId>
>
>                      <artifactId>*junit*</artifactId>
>
>                      <version>4.10</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.vertica</groupId>
>
>                      <artifactId>*vertica*-*jdbc*</artifactId>
>
>                      <version>7.2.1-0</version>
>
>               </dependency>
>
>               <dependency>
>
>               <groupId>org.apache.hbase</groupId>
>
>               <artifactId>*hbase*-client</artifactId>
>
>               <version>1.1.2</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>org.slf4j</groupId>
>
>                      <artifactId>slf4j-log4j12</artifactId>
>
>                      <version>1.7.19</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.datatorrent</groupId>
>
>                      <artifactId>*dt*-engine</artifactId>
>
>                      <version>${datatorrent.version}</version>
>
>                      <scope>test</scope>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>net.sf.flatpack</groupId>
>
>                      <artifactId>*flatpack*</artifactId>
>
>                      <version>3.4.2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.jdom</groupId>
>
>                      <artifactId>*jdom*</artifactId>
>
>                      <version>1.1.3</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.xmlbeans</groupId>
>
>                      <artifactId>*xmlbeans*</artifactId>
>
>                      <version>2.3.0</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>dom4j</groupId>
>
>                      <artifactId>dom4j</artifactId>
>
>                      <version>1.6.1</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>javax.xml.stream</groupId>
>
>                      <artifactId>*stax*-*api*</artifactId>
>
>                      <version>1.0-2</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>
>
>               <dependency>
>
>                      <groupId>org.apache.poi</groupId>
>
>                      <artifactId>*poi*-*ooxml*-schemas</artifactId>
>
>                      <version>3.9</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>               <dependency>
>
>                      <groupId>com.jcraft</groupId>
>
>                      <artifactId>*jsch*</artifactId>
>
>                      <version>0.1.53</version>
>
>               </dependency>
>
>        </dependencies>
>
>
>
> </project>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 4:37 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> It looks like you may be including old Hadoop jars in your apa package
> since the stack trace
>
> shows *ConverterUtils.toContainerId* calling
> *ConverterUtils.toApplicationAttemptId* but recent versions
>
> don't have that call sequence. In 2.7.1 (which is what your cluster has)
> the function looks like this:
>
>  * public static ContainerId toContainerId(String containerIdStr) {*
>
> *    return ContainerId.fromString(containerIdStr);*
>
> *  }*
>
>
>
> Could you post the output of "*jar tvf {your-apa-file}*" as well as: "*mvn
> dependency:tree"*
>
>
>
> Ram
>
>
>
> On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram,
>
>
>
> Below is the information.
>
>
>
>
>
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>
>                                  Dload  Upload   Total   Spent    Left
> Speed
>
> 100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--
> 3807
>
> {
>
>     "clusterInfo": {
>
>         "haState": "ACTIVE",
>
>         "haZooKeeperConnectionState": "CONNECTED",
>
>         "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa98205f18bc
>
> caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
>
>         "hadoopVersion": "2.7.1.2.3.2.0-2950",
>
>         "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
>
>         "id": 1465495186350,
>
>         "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa9
>
> 8205f18bccaeaf36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
>         "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>        "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
>         "rmStateStoreName":
> "org.apache.hadoop.yarn.server.resourcemanager.recov
>
> ery.ZKRMStateStore",
>
>         "startedOn": 1465495186350,
>
>         "state": "STARTED"
>
>     }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 2:57 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Can you ssh to one of the cluster nodes ? If so, can you run this command
> and show the output
>
> (where *{rm} *is the *host:port* running the resource manager, aka YARN):
>
>
>
> *curl http://{rm}/ws/v1/cluster <http://%7brm%7d/ws/v1/cluster> | python
> -mjson.tool*
>
>
>
> Ram
>
> ps. You can determine the node running YARN with:
>
>
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.address*
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*
>
>
>
>
>
>
>
> On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> è We have 120 directories to scan on HDFS, if we use parallel
> partitioning with operator memory around 250MB , it might be around 30GB of
> RAM for the processing of this operator, are these figures going to create
> any problem in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> è Should I use a scheduler for running the batch job (or) define next
> scan time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram,

Below are the details.


0 Wed Jun 15 16:34:24 EDT 2016 META-INF/
   358 Wed Jun 15 16:34:22 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Jun 15 16:34:24 EDT 2016 app/
52967 Wed Jun 15 16:34:22 EDT 2016 app/countrynamescan-1.0-SNAPSHOT.jar
     0 Wed Jun 15 16:34:22 EDT 2016 lib/
62983 Wed Jun 15 16:34:22 EDT 2016 lib/activation-1.1.jar
1143233 Wed Jun 15 16:34:22 EDT 2016 lib/activemq-client-5.8.0.jar
  4467 Wed Jun 15 16:34:22 EDT 2016 lib/aopalliance-1.0.jar
44925 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-i18n-2.0.0-M15.jar
691479 Wed Jun 15 16:34:22 EDT 2016 lib/apacheds-kerberos-codec-2.0.0-M15.jar
16560 Wed Jun 15 16:34:22 EDT 2016 lib/api-asn1-api-1.0.0-M20.jar
79912 Wed Jun 15 16:34:22 EDT 2016 lib/api-util-1.0.0-M20.jar
43033 Wed Jun 15 16:34:22 EDT 2016 lib/asm-3.1.jar
303139 Wed Jun 15 16:34:22 EDT 2016 lib/avro-1.7.4.jar
232019 Wed Jun 15 16:34:22 EDT 2016 lib/commons-beanutils-1.8.3.jar
41123 Wed Jun 15 16:34:22 EDT 2016 lib/commons-cli-1.2.jar
284184 Wed Jun 15 16:34:22 EDT 2016 lib/commons-codec-1.10.jar
575389 Wed Jun 15 16:34:22 EDT 2016 lib/commons-collections-3.2.1.jar
30595 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compiler-2.7.8.jar
241367 Wed Jun 15 16:34:22 EDT 2016 lib/commons-compress-1.4.1.jar
298829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-configuration-1.6.jar
143602 Wed Jun 15 16:34:22 EDT 2016 lib/commons-digester-1.8.jar
112341 Wed Jun 15 16:34:22 EDT 2016 lib/commons-el-1.0.jar
305001 Wed Jun 15 16:34:22 EDT 2016 lib/commons-httpclient-3.1.jar
185140 Wed Jun 15 16:34:22 EDT 2016 lib/commons-io-2.4.jar
284220 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang-2.6.jar
315805 Wed Jun 15 16:34:22 EDT 2016 lib/commons-lang3-3.1.jar
61829 Wed Jun 15 16:34:22 EDT 2016 lib/commons-logging-1.2.jar
1599627 Wed Jun 15 16:34:22 EDT 2016 lib/commons-math3-3.1.1.jar
273370 Wed Jun 15 16:34:22 EDT 2016 lib/commons-net-3.1.jar
3608597 Wed Jun 15 16:34:22 EDT 2016 lib/db2jcc-123.jar
313898 Wed Jun 15 16:34:22 EDT 2016 lib/dom4j-1.6.1.jar
17138265 Wed Jun 15 16:34:22 EDT 2016 lib/fastutil-6.6.4.jar
15322 Wed Jun 15 16:34:22 EDT 2016 lib/findbugs-annotations-1.3.9-1.jar
84946 Wed Jun 15 16:34:22 EDT 2016 lib/flatpack-3.4.2.jar
20220 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-j2ee-management_1.1_spec-1.0.1
jar
32359 Wed Jun 15 16:34:22 EDT 2016 lib/geronimo-jms_1.1_spec-1.1.1.jar
21817 Wed Jun 15 16:34:22 EDT 2016 lib/gmbal-api-only-3.0.0-b023.jar
690573 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-framework-2.1.2.jar
253086 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-2.1.2.jar
198255 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-server-2.1.2.jar
336904 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-http-servlet-2.1.2.jar
  8114 Wed Jun 15 16:34:22 EDT 2016 lib/grizzly-rcm-2.1.2.jar
1795932 Wed Jun 15 16:34:22 EDT 2016 lib/guava-12.0.1.jar
710492 Wed Jun 15 16:34:22 EDT 2016 lib/guice-3.0.jar
65012 Wed Jun 15 16:34:22 EDT 2016 lib/guice-servlet-3.0.jar
16778 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-annotations-2.2.0.jar
52449 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-auth-2.5.1.jar
2962685 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-common-2.5.1.jar
1498368 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-mapreduce-client-core-2.5.1.jar
1158936 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-api-2.2.0.jar
1301627 Wed Jun 15 16:34:22 EDT 2016 lib/hadoop-yarn-common-2.2.0.jar
50139 Wed Jun 15 16:34:22 EDT 2016 lib/hawtbuf-1.9.jar
20780 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-annotations-1.1.2.jar
1249004 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-client-1.1.2.jar
530078 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-common-1.1.2.jar
4201685 Wed Jun 15 16:34:22 EDT 2016 lib/hbase-protocol-1.1.2.jar
1475955 Wed Jun 15 16:34:22 EDT 2016 lib/htrace-core-3.1.0-incubating.jar
590533 Wed Jun 15 16:34:22 EDT 2016 lib/httpclient-4.3.5.jar
282269 Wed Jun 15 16:34:22 EDT 2016 lib/httpcore-4.3.2.jar
228286 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-core-asl-1.9.2.jar
765648 Wed Jun 15 16:34:22 EDT 2016 lib/jackson-mapper-asl-1.9.2.jar
  2497 Wed Jun 15 16:34:22 EDT 2016 lib/javax.inject-1.jar
521991 Wed Jun 15 16:34:22 EDT 2016 lib/javax.mail-1.5.0.jar
83945 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-3.1.jar
85353 Wed Jun 15 16:34:22 EDT 2016 lib/javax.servlet-api-3.0.1.jar
105134 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-api-2.2.2.jar
890168 Wed Jun 15 16:34:22 EDT 2016 lib/jaxb-impl-2.2.3-1.jar
1291164 Wed Jun 15 16:34:22 EDT 2016 lib/jcodings-1.0.8.jar
151304 Wed Jun 15 16:34:22 EDT 2016 lib/jdom-1.1.3.jar
130458 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-client-1.9.jar
458739 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-core-1.9.jar
17542 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-grizzly2-1.9.jar
14786 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-guice-1.9.jar
147952 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-json-1.9.jar
713089 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-server-1.9.jar
28100 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-core-1.9.jar
12976 Wed Jun 15 16:34:22 EDT 2016 lib/jersey-test-framework-grizzly2-1.9.jar
67758 Wed Jun 15 16:34:22 EDT 2016 lib/jettison-1.1.jar
21144 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-continuation-8.1.10.v20130312.jar
95709 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-http-8.1.10.v20130312.jar
103622 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-io-8.1.10.v20130312.jar
89691 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-security-8.1.10.v20130312.jar
347020 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-server-8.1.10.v20130312.jar
101052 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-servlet-8.1.10.v20130312.jar
177131 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-6.1.26.jar
284903 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-util-8.1.10.v20130312.jar
125928 Wed Jun 15 16:34:22 EDT 2016 lib/jetty-websocket-8.1.10.v20130312.jar
39117 Wed Jun 15 16:34:22 EDT 2016 lib/jms-api-1.1-rev-1.jar
187292 Wed Jun 15 16:34:22 EDT 2016 lib/joni-2.1.2.jar
280205 Wed Jun 15 16:34:22 EDT 2016 lib/jsch-0.1.53.jar
33015 Wed Jun 15 16:34:22 EDT 2016 lib/jsr305-1.3.9.jar
489884 Wed Jun 15 16:34:22 EDT 2016 lib/log4j-1.2.17.jar
565387 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-contrib-3.2.0-incubating.jar
1062000 Wed Jun 15 16:34:22 EDT 2016 lib/malhar-library-3.1.1.jar
42212 Wed Jun 15 16:34:22 EDT 2016 lib/management-api-3.0.0-b012.jar
  8798 Wed Jun 15 16:34:22 EDT 2016 lib/named-regexp-0.2.3.jar
1206119 Wed Jun 15 16:34:22 EDT 2016 lib/netty-3.6.6.Final.jar
1779991 Wed Jun 15 16:34:22 EDT 2016 lib/netty-all-4.0.23.Final.jar
29555 Wed Jun 15 16:34:22 EDT 2016 lib/paranamer-2.3.jar
1869113 Wed Jun 15 16:34:22 EDT 2016 lib/poi-3.9.jar
936648 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-3.9.jar
4802621 Wed Jun 15 16:34:22 EDT 2016 lib/poi-ooxml-schemas-3.9.jar
533455 Wed Jun 15 16:34:22 EDT 2016 lib/protobuf-java-2.5.0.jar
26084 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-api-1.7.5.jar
  9943 Wed Jun 15 16:34:22 EDT 2016 lib/slf4j-log4j12-1.7.19.jar
995968 Wed Jun 15 16:34:22 EDT 2016 lib/snappy-java-1.0.4.1.jar
23346 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0-2.jar
26514 Wed Jun 15 16:34:22 EDT 2016 lib/stax-api-1.0.1.jar
  2455 Wed Jun 15 16:34:22 EDT 2016 lib/tdgssconfig-14.00.00.21.jar
991265 Wed Jun 15 16:34:22 EDT 2016 lib/terajdbc4-14.00.00.21.jar
758309 Wed Jun 15 16:34:22 EDT 2016 lib/vertica-jdbc-7.2.1-0.jar
109318 Wed Jun 15 16:34:22 EDT 2016 lib/xml-apis-1.0.b2.jar
2666695 Wed Jun 15 16:34:22 EDT 2016 lib/xmlbeans-2.3.0.jar
15010 Wed Jun 15 16:34:22 EDT 2016 lib/xmlenc-0.52.jar
94672 Wed Jun 15 16:34:22 EDT 2016 lib/xz-1.0.jar
792964 Wed Jun 15 16:34:22 EDT 2016 lib/zookeeper-3.4.6.jar
     0 Wed Jun 15 16:34:28 EDT 2016 conf/
   334 Mon Apr 04 11:18:00 EDT 2016 conf/my-app-conf1.xml
  3432 Wed Jun 15 16:22:20 EDT 2016 META-INF/properties.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.rbc.aml.cnscan</groupId>
       <version>1.0-SNAPSHOT</version>
       <artifactId>countrynamescan</artifactId>
       <packaging>jar</packaging>

       <!-- change these to the appropriate values -->
       <name>countrynamescan</name>
       <description>Country and Name Scan project</description>

       <properties>
              <!-- change this if you desire to use a different version of DataTorrent -->
              <datatorrent.version>3.1.1</datatorrent.version>
              <datatorrent.apppackage.classpath>lib/*.jar</datatorrent.apppackage.classpath>
       </properties>

       <!-- repository to provide the DataTorrent artifacts -->
       <!-- <repositories>
              <repository>
                     <snapshots>
                           <enabled>false</enabled>
                     </snapshots>
                     <id>Datatorrent-Releases</id>
                     <name>DataTorrent Release Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/releases/</url>
              </repository>
              <repository>
                     <releases>
                           <enabled>false</enabled>
                     </releases>
                     <id>DataTorrent-Snapshots</id>
                     <name>DataTorrent Early Access Program Snapshot Repository</name>
                     <url>https://www.datatorrent.com/maven/content/repositories/snapshots/</url>
              </repository>
       </repositories> -->


       <build>
              <plugins>
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-eclipse-plugin</artifactId>
                           <version>2.9</version>
                           <configuration>
                                  <downloadSources>true</downloadSources>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-compiler-plugin</artifactId>
                           <version>3.3</version>
                           <configuration>
                                  <encoding>UTF-8</encoding>
                                  <source>1.7</source>
                                  <target>1.7</target>
                                  <debug>true</debug>
                                  <optimize>false</optimize>
                                  <showDeprecation>true</showDeprecation>
                                  <showWarnings>true</showWarnings>
                           </configuration>
                     </plugin>
                     <plugin>
                           <artifactId>maven-dependency-plugin</artifactId>
                           <version>2.8</version>
                           <executions>
                                  <execution>
                                         <id>copy-dependencies</id>
                                         <phase>prepare-package</phase>
                                         <goals>
                                                <goal>copy-dependencies</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>target/deps</outputDirectory>
                                                <includeScope>runtime</includeScope>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-assembly-plugin</artifactId>
                           <executions>
                                  <execution>
                                         <id>app-package-assembly</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>single</goal>
                                         </goals>
                                         <configuration>
                                                <finalName>${project.artifactId}-${project.version}-apexapp</finalName>
                                                <appendAssemblyId>false</appendAssemblyId>
                                                <descriptors>
                                                       <descriptor>src/assemble/appPackage.xml</descriptor>
                                                </descriptors>
                                                <archiverConfig>
                                                       <defaultDirectoryMode>0755</defaultDirectoryMode>
                                                </archiverConfig>
                                                <archive>
                                                       <manifestEntries>
                                                              <Class-Path>${datatorrent.apppackage.classpath}</Class-Path>
                                                              <DT-Engine-Version>${datatorrent.version}</DT-Engine-Version>
                                                              <DT-App-Package-Name>${project.artifactId}</DT-App-Package-Name>
                                                              <DT-App-Package-Version>${project.version}</DT-App-Package-Version>
                                                              <DT-App-Package-Display-Name>${project.name}</DT-App-Package-Display-Name>
                                                              <DT-App-Package-Description>${project.description}</DT-App-Package-Description>
                                                      </manifestEntries>
                                                </archive>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <artifactId>maven-antrun-plugin</artifactId>
                           <version>1.7</version>
                           <executions>
                                  <execution>
                                         <phase>package</phase>
                                         <configuration>
                                                <target>
                                                       <move
                                                              file="${project.build.directory}/${project.artifactId}-${project.version}-apexapp.jar"
                                                              tofile="${project.build.directory}/${project.artifactId}-${project.version}.apa" />
                                                </target>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                                  <execution>
                                         <!-- create resource directory for xml javadoc -->
                                         <id>createJavadocDirectory</id>
                                         <phase>generate-resources</phase>
                                         <configuration>
                                                <tasks>
                                                       <delete
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc" />
                                                       <mkdir
                                                              dir="${project.build.directory}/generated-resources/xml-javadoc" />
                                                </tasks>
                                         </configuration>
                                         <goals>
                                                <goal>run</goal>
                                         </goals>
                                  </execution>
                           </executions>
                     </plugin>

                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>build-helper-maven-plugin</artifactId>
                           <version>1.9.1</version>
                           <executions>
                                  <execution>
                                         <id>attach-artifacts</id>
                                         <phase>package</phase>
                                         <goals>
                                                <goal>attach-artifact</goal>
                                         </goals>
                                         <configuration>
                                                <artifacts>
                                                       <artifact>
                                                              <file>target/${project.artifactId}-${project.version}.apa</file>
                                                              <type>apa</type>
                                                       </artifact>
                                                </artifacts>
                                                <skipAttach>false</skipAttach>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

                     <!-- generate javdoc -->
                     <plugin>
                           <groupId>org.apache.maven.plugins</groupId>
                           <artifactId>maven-javadoc-plugin</artifactId>
                           <executions>
                                  <!-- generate xml javadoc -->
                                  <execution>
                                         <id>xml-doclet</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>javadoc</goal>
                                         </goals>
                                         <configuration>
                                                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                                                <additionalparam>-d
                                                       ${project.build.directory}/generated-resources/xml-javadoc
                                                       -filename ${project.artifactId}-${project.version}-javadoc.xml</additionalparam>
                                                <useStandardDocletOptions>false</useStandardDocletOptions>
                                                <docletArtifact>
                                                       <groupId>com.github.markusbernhardt</groupId>
                                                       <artifactId>xml-doclet</artifactId>
                                                       <version>1.0.4</version>
                                                </docletArtifact>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>
                     <!-- Transform xml javadoc to stripped down version containing only class/interface
                           comments and tags -->
                     <plugin>
                           <groupId>org.codehaus.mojo</groupId>
                           <artifactId>xml-maven-plugin</artifactId>
                           <version>1.0</version>
                           <executions>
                                  <execution>
                                         <id>transform-xmljavadoc</id>
                                         <phase>generate-resources</phase>
                                         <goals>
                                                <goal>transform</goal>
                                         </goals>
                                  </execution>
                           </executions>
                           <configuration>
                                  <transformationSets>
                                         <transformationSet>
                                                <dir>${project.build.directory}/generated-resources/xml-javadoc</dir>
                                                <includes>
                                                       <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                </includes>
                                                <stylesheet>XmlJavadocCommentsExtractor.xsl</stylesheet>
                                                <outputDir>${project.build.directory}/generated-resources/xml-javadoc</outputDir>
                                         </transformationSet>
                                  </transformationSets>
                           </configuration>
                     </plugin>
                     <!-- copy xml javadoc to class jar -->
                     <plugin>
                           <artifactId>maven-resources-plugin</artifactId>
                           <version>2.6</version>
                           <executions>
                                  <execution>
                                         <id>copy-resources</id>
                                         <phase>process-resources</phase>
                                         <goals>
                                                <goal>copy-resources</goal>
                                         </goals>
                                         <configuration>
                                                <outputDirectory>${basedir}/target/classes</outputDirectory>
                                                <resources>
                                                       <resource>
                                                              <directory>${project.build.directory}/generated-resources/xml-javadoc</directory>
                                                              <includes>
                                                                     <include>${project.artifactId}-${project.version}-javadoc.xml</include>
                                                              </includes>
                                                              <filtering>true</filtering>
                                                       </resource>
                                                </resources>
                                         </configuration>
                                  </execution>
                           </executions>
                     </plugin>

              </plugins>

              <pluginManagement>
                     <plugins>
                           <!--This plugin's configuration is used to store Eclipse m2e settings
                                  only. It has no influence on the Maven build itself. -->
                           <plugin>
                                  <groupId>org.eclipse.m2e</groupId>
                                  <artifactId>lifecycle-mapping</artifactId>
                                  <version>1.0.0</version>
                                  <configuration>
                                         <lifecycleMappingMetadata>
                                                <pluginExecutions>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId>org.codehaus.mojo</groupId>
                                                                     <artifactId>
                                                                           xml-maven-plugin
                                                                     </artifactId>
                                                                     <versionRange>[1.0,)</versionRange>
                                                                     <goals>
                                                                           <goal>transform</goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                       <pluginExecution>
                                                              <pluginExecutionFilter>
                                                                     <groupId></groupId>
                                                                     <artifactId></artifactId>
                                                                     <versionRange>[,)</versionRange>
                                                                     <goals>
                                                                           <goal></goal>
                                                                     </goals>
                                                              </pluginExecutionFilter>
                                                              <action>
                                                                     <ignore></ignore>
                                                              </action>
                                                       </pluginExecution>
                                                </pluginExecutions>
                                         </lifecycleMappingMetadata>
                                  </configuration>
                           </plugin>
                     </plugins>
              </pluginManagement>
       </build>

       <dependencies>
              <!-- add your dependencies here -->
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-common</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>provided</scope>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>malhar-library</artifactId>
                     <version>${datatorrent.version}</version>
                     <!-- If you know that your application does not need transitive dependencies
                           pulled in by malhar-library, uncomment the following to reduce the size of
                           your app package. -->
                     <!-- <exclusions> <exclusion> <groupId>*</groupId> <artifactId>*</artifactId>
                           </exclusion> </exclusions> -->
              </dependency>
              <dependency>
       <groupId>com.datatorrent</groupId>
       <artifactId>dt-engine</artifactId>
       <version>${datatorrent.version}</version>
       <scope>test</scope>
              </dependency>
              <dependency>
              <groupId>com.datatorrent</groupId>
              <artifactId>dt-common</artifactId>
              <version>${datatorrent.version}</version>
              <scope>provided</scope>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>terajdbc4</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.teradata.jdbc</groupId>
              <artifactId>tdgssconfig</artifactId>
              <version>14.00.00.21</version>
              </dependency>
              <dependency>
              <groupId>com.ibm.db2</groupId>
              <artifactId>db2jcc</artifactId>
              <version>123</version>
              </dependency>
              <dependency>
                     <groupId>jdk.tools</groupId>
                     <artifactId>jdk.tools</artifactId>
                     <version>1.7</version>
                     <scope>system</scope>
                     <systemPath>C:/Program Files/Java/jdk1.7.0_79/lib/tools.jar</systemPath>
              </dependency>
              <dependency>
                     <groupId>org.apache.apex</groupId>
                     <artifactId>malhar-contrib</artifactId>
                     <version>3.2.0-incubating</version>
                     <!--<scope>provided</scope> -->
                     <exclusions>
                           <exclusion>
                                  <groupId>*</groupId>
                                  <artifactId>*</artifactId>
                           </exclusion>
                     </exclusions>
              </dependency>
              <dependency>
                     <groupId>junit</groupId>
                     <artifactId>junit</artifactId>
                     <version>4.10</version>
                     <scope>test</scope>
              </dependency>
              <dependency>
                     <groupId>com.vertica</groupId>
                     <artifactId>vertica-jdbc</artifactId>
                     <version>7.2.1-0</version>
              </dependency>
              <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase-client</artifactId>
              <version>1.1.2</version>
              </dependency>
              <dependency>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                     <version>1.7.19</version>
              </dependency>
              <dependency>
                     <groupId>com.datatorrent</groupId>
                     <artifactId>dt-engine</artifactId>
                     <version>${datatorrent.version}</version>
                     <scope>test</scope>
              </dependency>

              <dependency>
                     <groupId>net.sf.flatpack</groupId>
                     <artifactId>flatpack</artifactId>
                     <version>3.4.2</version>
              </dependency>

              <dependency>
                     <groupId>org.jdom</groupId>
                     <artifactId>jdom</artifactId>
                     <version>1.1.3</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.xmlbeans</groupId>
                     <artifactId>xmlbeans</artifactId>
                     <version>2.3.0</version>
              </dependency>

              <dependency>
                     <groupId>dom4j</groupId>
                     <artifactId>dom4j</artifactId>
                     <version>1.6.1</version>
              </dependency>

              <dependency>
                     <groupId>javax.xml.stream</groupId>
                     <artifactId>stax-api</artifactId>
                     <version>1.0-2</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi</artifactId>
                     <version>3.9</version>
              </dependency>

              <dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml-schemas</artifactId>
                     <version>3.9</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
              <dependency>
                     <groupId>com.jcraft</groupId>
                     <artifactId>jsch</artifactId>
                     <version>0.1.53</version>
              </dependency>
       </dependencies>

</project>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com]
Sent: 2016, June, 16 4:37 PM
To: users@apex.apache.org
Subject: Re: Multiple directories

It looks like you may be including old Hadoop jars in your apa package since the stack trace
shows ConverterUtils.toContainerId calling ConverterUtils.toApplicationAttemptId but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has) the function looks like this:
  public static ContainerId toContainerId(String containerIdStr) {
    return ContainerId.fromString(containerIdStr);
  }

Could you post the output of "jar tvf {your-apa-file}" as well as: "mvn dependency:tree"

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==> We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==> Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by Munagala Ramanath <ra...@datatorrent.com>.
It looks like you may be including old Hadoop jars in your apa package
since the stack trace
shows *ConverterUtils.toContainerId* calling
*ConverterUtils.toApplicationAttemptId* but recent versions
don't have that call sequence. In 2.7.1 (which is what your cluster has)
the function looks like this:
 * public static ContainerId toContainerId(String containerIdStr) {*
*    return ContainerId.fromString(containerIdStr);*
*  }*

Could you post the output of "*jar tvf {your-apa-file}*" as well as: "*mvn
dependency:tree"*

Ram

On Thu, Jun 16, 2016 at 12:38 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi Ram,
>
>
>
> Below is the information.
>
>
>
>
>
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>
>                                  Dload  Upload   Total   Spent    Left
> Speed
>
> 100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--
> 3807
>
> {
>
>     "clusterInfo": {
>
>         "haState": "ACTIVE",
>
>         "haZooKeeperConnectionState": "CONNECTED",
>
>         "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa98205f18bc
>
> caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
>
>         "hadoopVersion": "2.7.1.2.3.2.0-2950",
>
>         "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
>
>         "id": 1465495186350,
>
>         "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from
> 5cc60e0003e33aa9
>
> 8205f18bccaeaf36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
>         "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>        "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
>         "rmStateStoreName":
> "org.apache.hadoop.yarn.server.resourcemanager.recov
>
> ery.ZKRMStateStore",
>
>         "startedOn": 1465495186350,
>
>         "state": "STARTED"
>
>     }
>
> }
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 16 2:57 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Can you ssh to one of the cluster nodes ? If so, can you run this command
> and show the output
>
> (where *{rm} *is the *host:port* running the resource manager, aka YARN):
>
>
>
> *curl http://{rm}/ws/v1/cluster <http://%7brm%7d/ws/v1/cluster> | python
> -mjson.tool*
>
>
>
> Ram
>
> ps. You can determine the node running YARN with:
>
>
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.address*
>
> *hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*
>
>
>
>
>
>
>
> On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> è We have 120 directories to scan on HDFS, if we use parallel
> partitioning with operator memory around 250MB , it might be around 30GB of
> RAM for the processing of this operator, are these figures going to create
> any problem in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> è Should I use a scheduler for running the batch job (or) define next
> scan time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram,

Below is the information.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712    0   712    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
    "clusterInfo": {
        "haState": "ACTIVE",
        "haZooKeeperConnectionState": "CONNECTED",
        "hadoopBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa98205f18bc
caeaf36cb193c1c by jenkins source checksum 69a3bf8c667267c2c252a54fbbf23d",
        "hadoopVersion": "2.7.1.2.3.2.0-2950",
        "hadoopVersionBuiltOn": "2015-09-30T18:08Z",
        "id": 1465495186350,
        "resourceManagerBuildVersion": "2.7.1.2.3.2.0-2950 from 5cc60e0003e33aa9
8205f18bccaeaf36cb193c1c by jenkins source checksum 48db4b572827c2e9c2da66982d14
7626",
        "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
       "resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recov
ery.ZKRMStateStore",
        "startedOn": 1465495186350,
        "state": "STARTED"
    }
}

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org
Subject: Re: Multiple directories

Can you ssh to one of the cluster nodes ? If so, can you run this command and show the output
(where {rm} is the host:port running the resource manager, aka YARN):

curl http://{rm}/ws/v1/cluster<http://%7brm%7d/ws/v1/cluster> | python -mjson.tool

Ram
ps. You can determine the node running YARN with:

hdfs getconf -confKey yarn.resourcemanager.webapp.address
hdfs getconf -confKey yarn.resourcemanager.webapp.https.address



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM<ma...@SAIFG.RBC.COM>
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com<http://guedlpdhdp012.saifg.rbc.com>
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==> We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==> Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Can you ssh to one of the cluster nodes ? If so, can you run this command
and show the output
(where *{rm} *is the *host:port* running the resource manager, aka YARN):

*curl http://**{rm}**/ws/v1/cluster | python -mjson.tool*

Ram
ps. You can determine the node running YARN with:


*hdfs getconf -confKey yarn.resourcemanager.webapp.address*
*hdfs getconf -confKey yarn.resourcemanager.webapp.https.address*



On Thu, Jun 16, 2016 at 11:15 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi,
>
>
>
> I am facing a weird  issue and the logs are not clear to me !!
>
>
>
> I have created apa file which works fine within my local sandbox but
> facing problems when I upload on the enterprise Hadoop cluster using DT
> Console.
>
>
>
> Below is the error message from yarn logs. Please help in understanding
> the issue.
>
>
>
> ###################### Error Logs
> ########################################################
>
>
>
> Log Type: AppMaster.stderr
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 1259
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in
> [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> ContainerId: container_e35_1465495186350_2224_01_000001
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
>
>         at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
>
> Caused by: java.lang.NumberFormatException: For input string: "e35"
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:441)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
>
>         at
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
>
>         ... 1 more
>
>
>
> Log Type: AppMaster.stdout
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 0
>
>
>
> Log Type: dt.log
>
> Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
>
> Log Length: 29715
>
> Showing 4096 bytes of 29715 total. Click here
> <http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for
> the full log.
>
> 56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m
> -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
>
> SHLVL=3
>
> HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
>
> HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
>
> HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log
> -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m
> -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m
> -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m
> -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS
> -Dhdfs.audit.logger=INFO,DRFAAUDIT
> -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node"
> -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
>
> HADOOP_IDENT_STRING=yarn
>
> HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
>
> NM_HOST=guedlpdhdp012.saifg.rbc.com
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
>
> YARN_HISTORYSERVER_HEAPSIZE=1024
>
> JVM_PID=2638
>
> YARN_PID_DIR=/var/run/hadoop-yarn/yarn
>
> HADOOP_HOME_WARN_SUPPRESS=1
>
> NM_PORT=45454
>
> LOGNAME=mukkamula
>
> YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
>
> HADOOP_YARN_USER=yarn
>
> QTDIR=/usr/lib64/qt-3.3
>
> _=/usr/lib/jvm/java-1.7.0/bin/java
>
> MSM_PRODUCT=MSM
>
> HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
>
> MALLOC_ARENA_MAX=4
>
> HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true
> -Dhdp.version= -Djava.net.preferIPv4Stack=true
> -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn
> -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
> -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop
> -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console
> -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native
> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>
> SHELL=/bin/bash
>
> YARN_ROOT_LOGGER=INFO,EWMA,RFA
>
>
> HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
>
>
> CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
> CONTAINER_ID=container_e35_1465495186350_2224_01_000001
>
> HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
>
> HISTCONTROL=ignoredups
>
> HOME=/home/
>
> HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
>
> MSM_HOME=/usr/local/MegaRAID Storage Manager
>
> LESSOPEN=||/usr/bin/lesspipe.sh %s
>
> LANG=en_US.UTF-8
>
> YARN_NICENESS=0
>
> YARN_IDENT_STRING=yarn
>
> HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, June, 16 8:58 AM
> *To:* users@apex.apache.org
> *Subject:* RE: Multiple directories
>
>
>
> Thank you for the inputs.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Thomas Weise [mailto:thomas.weise@gmail.com
> <th...@gmail.com>]
> *Sent:* 2016, June, 15 5:08 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
>
>
> On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> è We have 120 directories to scan on HDFS, if we use parallel
> partitioning with operator memory around 250MB , it might be around 30GB of
> RAM for the processing of this operator, are these figures going to create
> any problem in production ?
>
>
>
> You can benchmark this with a single partition. If the downstream
> operators can keep up with the rate at which the file reader emits, then
> the memory consumption should be minimal. Keep in mind though that the
> container memory is not just heap space for the operator, but also memory
> the JVM requires to run and the memory that the buffer server consumes. You
> see the allocated memory in the UI if you use the DT community edition
> (container list in the physical plan).
>
>
>
> è Should I use a scheduler for running the batch job (or) define next
> scan time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
> It is possible to set this up elastically also, so that when there is no
> input available, the number of reader partitions are reduced and the memory
> given back (Apex supports dynamic scaling).
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi,

I am facing a weird  issue and the logs are not clear to me !!

I have created apa file which works fine within my local sandbox but facing problems when I upload on the enterprise Hadoop cluster using DT Console.

Below is the error message from yarn logs. Please help in understanding the issue.

###################### Error Logs ########################################################

Log Type: AppMaster.stderr
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 1259
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/06/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/filecache/36/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Invalid ContainerId: container_e35_1465495186350_2224_01_000001
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
        at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:90)
Caused by: java.lang.NumberFormatException: For input string: "e35"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:441)
        at java.lang.Long.parseLong(Long.java:483)
        at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
        at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
        ... 1 more

Log Type: AppMaster.stdout
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 0

Log Type: dt.log
Log Upload Time: Thu Jun 16 14:07:46 -0400 2016
Log Length: 29715
Showing 4096 bytes of 29715 total. Click here<http://guedlpdhdp001.saifg.rbc.com:19888/jobhistory/logs/guedlpdhdp012.saifg.rbc.com:45454/container_e35_1465495186350_2224_01_000001/container_e35_1465495186350_2224_01_000001/mukkamula/dt.log/?start=0> for the full log.
56m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
SHLVL=3
HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR
HADOOP_USER_NAME=datatorrent/gueulvahal003.saifg.rbc.com@SAIFG.RBC.COM
HADOOP_NAMENODE_OPTS=-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/yarn/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/yarn/gc.log-201606140038 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms8192m -Xmx8192m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1
HADOOP_IDENT_STRING=yarn
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce/yarn
NM_HOST=guedlpdhdp012.saifg.rbc.com
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/hdfs
YARN_HISTORYSERVER_HEAPSIZE=1024
JVM_PID=2638
YARN_PID_DIR=/var/run/hadoop-yarn/yarn
HADOOP_HOME_WARN_SUPPRESS=1
NM_PORT=45454
LOGNAME=mukkamula
YARN_CONF_DIR=/usr/hdp/current/hadoop-client/conf
HADOOP_YARN_USER=yarn
QTDIR=/usr/lib64/qt-3.3
_=/usr/lib/jvm/java-1.7.0/bin/java
MSM_PRODUCT=MSM
HADOOP_HOME=/usr/hdp/2.3.2.0-2950/hadoop
MALLOC_ARENA_MAX=4
HADOOP_OPTS=-Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true  -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/yarn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
SHELL=/bin/bash
YARN_ROOT_LOGGER=INFO,EWMA,RFA
HADOOP_TOKEN_FILE_LOCATION=/grid/11/hadoop/yarn/local/usercache/mukkamula/appcache/application_1465495186350_2224/container_e35_1465495186350_2224_01_000001/container_tokens
CLASSPATH=./*:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*
HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
YARN_NODEMANAGER_HEAPSIZE=1024
QTINC=/usr/lib64/qt-3.3/include
USER=mukkamula
HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m -XX:MaxPermSize=512m
CONTAINER_ID=container_e35_1465495186350_2224_01_000001
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/hdfs
HISTCONTROL=ignoredups
HOME=/home/
HADOOP_NAMENODE_INIT_HEAPSIZE=-Xms8192m
MSM_HOME=/usr/local/MegaRAID Storage Manager
LESSOPEN=||/usr/bin/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce


Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org
Subject: RE: Multiple directories

Thank you for the inputs.

Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.weise@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories


On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


==> We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

You can benchmark this with a single partition. If the downstream operators can keep up with the rate at which the file reader emits, then the memory consumption should be minimal. Keep in mind though that the container memory is not just heap space for the operator, but also memory the JVM requires to run and the memory that the buffer server consumes. You see the allocated memory in the UI if you use the DT community edition (container list in the physical plan).


==> Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.
It is possible to set this up elastically also, so that when there is no input available, the number of reader partitions are reduced and the memory given back (Apex supports dynamic scaling).


Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, June, 05 10:24 PM

To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.

Re: Multiple directories

Posted by Thomas Weise <th...@gmail.com>.
On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkamula@rbc.com> wrote:

> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configuration file and generates
> output file to different directories.
>
>
>
> However I have some questions regarding the design.
>
>
>
> è We have 120 directories to scan on HDFS, if we use parallel
> partitioning with operator memory around 250MB , it might be around 30GB of
> RAM for the processing of this operator, are these figures going to create
> any problem in production ?
>

You can benchmark this with a single partition. If the downstream operators
can keep up with the rate at which the file reader emits, then the memory
consumption should be minimal. Keep in mind though that the container
memory is not just heap space for the operator, but also memory the JVM
requires to run and the memory that the buffer server consumes. You see the
allocated memory in the UI if you use the DT community edition (container
list in the physical plan).


> è Should I use a scheduler for running the batch job (or) define next
> scan time and make the DT job running continuously ? if I run DT job
> continuously I assume memory will be continuously utilized by the DT Job it
> is not available to other resources on the cluster, please clarify.
>
It is possible to set this up elastically also, so that when there is no
input available, the number of reader partitions are reduced and the memory
given back (Apex supports dynamic scaling).



> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, June, 05 10:24 PM
>
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> Some sample code to monitor multiple directories is now available at:
>
>
> https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
>
>
>
> It shows how to use a custom implementation of definePartitions() to create
>
> multiple partitions of the file input operator and group them
>
> into "slices" where each slice monitors a single directory.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> I'm hoping to have a sample sometime next week.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> suryavamshivardhan.mukkamula@rbc.com> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

RE: Multiple directories

Posted by "Mukkamula, Suryavamshivardhan (CWM-NR)" <su...@rbc.com>.
Hi Ram/Team,

I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories.

However I have some questions regarding the design.


è We have 120 directories to scan on HDFS, if we use parallel partitioning with operator memory around 250MB , it might be around 30GB of RAM for the processing of this operator, are these figures going to create any problem in production ?

è Should I use a scheduler for running the batch job (or) define next scan time and make the DT job running continuously ? if I run DT job continuously I assume memory will be continuously utilized by the DT Job it is not available to other resources on the cluster, please clarify.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com]
Sent: 2016, June, 05 10:24 PM
To: users@apex.apache.org
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath <ra...@datatorrent.com>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my requirement.

Do you have sample usage for partitioning with individual configuration set ups different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 25 12:11 PM
To: users@apex.apache.org<ma...@apex.apache.org>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is scanning its own directory
you don't need to worry about which files the lines came from. This approach however needs a custom
definePartition() implementation in your subclass to assign the appropriate directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the box but have more
elaborate configuration options. Check this out and see if it works in your use case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <su...@rbc.com>> wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and parse those files by reading XML configuration files (each input feed has configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual feed identifier, feed location , configuration file location. I would like to read this mapping file at initial load within setup() method and define my DirectoryScan.acceptFiles. Here my challenge is when I read the files , I should parse the lines by reading the individual configuration files. How do I know the line is from particular file , if I know this I can read the corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:ram@datatorrent.com<ma...@datatorrent.com>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a script) to
copy all the input files to a common directory (making sure that the file names are
unique to prevent one file from overwriting another) before the Apex application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create N partitions and
the files will be automatically distributed among the partitions. The partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this [email] or the information it contains by other than an intended recipient is unauthorized. If you received this [email] in error, please advise the sender (by return [email] or otherwise) immediately. You have consented to receive the attached electronically at the above-noted address; please retain a copy of this confirmation for future reference.