You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "Ilya Ganelin (JIRA)" <ji...@apache.org> on 2016/03/16 21:54:33 UTC
[jira] [Updated] (APEXCORE-392) Stack Overflow when launching jobs
[ https://issues.apache.org/jira/browse/APEXCORE-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Ganelin updated APEXCORE-392:
----------------------------------
Description:
I’m running into a very frustrating issue where certain DAG configurations cause the following error log (attached). When this happens, my application even fails to launch. This does not seem to be a YARN issue since this occurs even with a relatively small number of partitions/memory.
This issue DOES appear to be related to HDFS input/output operations since the specific parameter that appears to affect things is the number of physical partitions for the HDFS input/output operators.
I’ve also attached the input and output operators in question:
https://gist.github.com/ilganeli/7f770374113b40ffa18a
I can get this to occur predictable by
1. Increasing the partition count on my input operator (reads from HDFS) - values above 20 cause this error
2. Increase the partition count on my output operator (writes to HDFS) - values above 20 cause this error
3. Set stream locality from the default to either thread local, node local, or container_local on the output operator
This behavior is very frustrating as it’s preventing me from partitioning my HDFS I/O appropriately, thus allowing me to scale to higher throughputs.
was:
I’m running into a very frustrating issue where certain DAG configurations cause the following error log (attached). When this happens, my application even fails to launch. This does not seem to be a YARN issue since this occurs even with a relatively small number of partitions/memory.
This issue DOES appear to be related to HDFS input/output operations since the specific parameter that appears to affect things is the number of physical partitions for the HDFS input/output operators.
I’ve also attached the input and output operators in question:
https://gist.github.com/ilganeli/7f770374113b40ffa18a
I can get this to occur predictable by
1. Increasing the partition count on my input operator (reads from HDFS) - values above 20 cause this error
2. Increase the partition count on my output operator (writes to HDFS) - values above 20 cause this error
3. Set stream locality from the default to either thread local, node local, or container_local on the output operator
This behavior is very frustrating as it’s preventing me from partitioning my HDFS I/O appropriately, thus allowing me to scale to higher throughputs.
Do you have any thoughts on what’s going wrong? I would love your feedback.
> Stack Overflow when launching jobs
> ----------------------------------
>
> Key: APEXCORE-392
> URL: https://issues.apache.org/jira/browse/APEXCORE-392
> Project: Apache Apex Core
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Ilya Ganelin
> Priority: Blocker
>
> I’m running into a very frustrating issue where certain DAG configurations cause the following error log (attached). When this happens, my application even fails to launch. This does not seem to be a YARN issue since this occurs even with a relatively small number of partitions/memory.
> This issue DOES appear to be related to HDFS input/output operations since the specific parameter that appears to affect things is the number of physical partitions for the HDFS input/output operators.
> I’ve also attached the input and output operators in question:
> https://gist.github.com/ilganeli/7f770374113b40ffa18a
> I can get this to occur predictable by
> 1. Increasing the partition count on my input operator (reads from HDFS) - values above 20 cause this error
> 2. Increase the partition count on my output operator (writes to HDFS) - values above 20 cause this error
> 3. Set stream locality from the default to either thread local, node local, or container_local on the output operator
> This behavior is very frustrating as it’s preventing me from partitioning my HDFS I/O appropriately, thus allowing me to scale to higher throughputs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)