You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2016/06/17 15:40:05 UTC
[jira] [Updated] (HADOOP-13286) add a scale test to do gunzip and
linecount
[ https://issues.apache.org/jira/browse/HADOOP-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-13286:
------------------------------------
Attachment: HADOOP-13286-branch-2-001.patch
Patch 001; streams the test data through the (presumably) non-native gz codec, then into LineReader. Simulates a mapper applied to a .CSV.gz file
timings
{code}
testDecompression128K: Decompress with a 128K readahead
2016-06-17 16:30:42,408 [Thread-0] INFO compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.gz]
2016-06-17 16:30:47,345 [Thread-0] INFO contract.ContractTestUtils (ContractTestUtils.java:end(1262)) - Duration of Time to read 514690 lines [99896260 bytes expanded, 22633778 raw] with readahead = 131072: 5,107,155,982 nS
2016-06-17 16:30:47,345 [Thread-0] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:logTimePerIOP(144)) - Time per IOP: 9,922 nS
2016-06-17 16:30:47,346 [Thread-0] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:logStreamStatistics(301)) - Stream Statistics
StreamStatistics{OpenOperations=1, CloseOperations=1, Closed=1, Aborted=0, SeekOperations=0, ReadExceptions=0, ForwardSeekOperations=0, BackwardSeekOperations=0, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=0, BytesRead=22633778, BytesRead excluding skipped=22633778, ReadOperations=5708, ReadFullyOperations=0, ReadsIncomplete=243}
{code}
that is: 1 microsecond/line; 5.1s for the entire 20MB file, which expands to 99MB on the way through the pipeline
> add a scale test to do gunzip and linecount
> -------------------------------------------
>
> Key: HADOOP-13286
> URL: https://issues.apache.org/jira/browse/HADOOP-13286
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13286-branch-2-001.patch
>
>
> the HADOOP-13203 patch proposal showed that there were performance problems downstream which weren't surfacing in the current scale tests.
> Trying to decompress the .gz test file and then go through it with LineReader models a basic use case: parse a .csv.gz data source.
> Add this, with metric printing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org