You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/08/14 10:47:01 UTC
[jira] [Commented] (HADOOP-14770) S3A http connection in s3a driver
not reuse in Spark application
[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125524#comment-16125524 ]
Steve Loughran commented on HADOOP-14770:
-----------------------------------------
# add the Hadoop version to the JIRA, thanks
# What is the file format? simple or columnar (ORC, Parquet)
# Looks like the connection is being closed on every seek, which is a sign of HADOOP-13203 not engaging (random IO), or on a sequential read, forward reads aborting/reopening rather than skipping forward.
Make sure you are using the Hadoop 2.8.x JARS, then:
For columnar data: enabling random IO.
{code}
spark.hadoop.fs.s3a.experimental.fadvise=random
{code}
For sequential data with big forward skips
{code}
spark.hadoop.fs.s3a.readahead.range = 768K
{code}
If this fixes it, close as a duplicate of HADOOP-13203
If this doesn't fix it, you can print both the input stream and s3a FS, as their toString() ops print all their stats.
Oh, one more possible cause: split calculation isn't getting it write. Look at your s3a block size, and the format itself.
> S3A http connection in s3a driver not reuse in Spark application
> ----------------------------------------------------------------
>
> Key: HADOOP-14770
> URL: https://issues.apache.org/jira/browse/HADOOP-14770
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Yonger
> Assignee: Yonger
>
> I print out connection stats every 2 s when running Spark application against s3-compatible storage:
> ESTAB 0 0 ::ffff:10.0.2.36:44446 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44454 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44374 ::ffff:10.0.2.254:80
> ESTAB 159724 0 ::ffff:10.0.2.36:44436 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44448 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44338 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44438 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44414 ::ffff:10.0.2.254:80
> ESTAB 0 480 ::ffff:10.0.2.36:44450 ::ffff:10.0.2.254:80 timer:(on,170ms,0)
> ESTAB 0 0 ::ffff:10.0.2.36:44442 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44390 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44326 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44452 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44394 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44444 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44456 ::ffff:10.0.2.254:80
> ======================
> ESTAB 0 0 ::ffff:10.0.2.36:44508 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44476 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44524 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44374 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44500 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44504 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44512 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44506 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44464 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44518 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44510 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44442 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44526 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44472 ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44466 ::ffff:10.0.2.254:80
> the connection in the above of "=" and below were changed all the time. But this haven't seen in MR application.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org