You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2014/06/10 00:03:03 UTC
[jira] [Updated] (PIG-4003) Error is thrown by
JobStats.getOutputSize() when storing to a Hive table
[ https://issues.apache.org/jira/browse/PIG-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-4003:
-------------------------------
Attachment: PIG-4003-1.patch
The attached patch addresses two issues-
# MRJobStats.addOutputStatistics() has redundant code. It handles the case of {{# of stores == 1}} separately, but it is not only unnecessary but also confusing since it adds an extra code path.
# Make FileBasedOutputSizeReader.support() return false for hive table names. If there is no scheme in uri, assumes it is not a hdfs path.
> Error is thrown by JobStats.getOutputSize() when storing to a Hive table
> -------------------------------------------------------------------------
>
> Key: PIG-4003
> URL: https://issues.apache.org/jira/browse/PIG-4003
> Project: Pig
> Issue Type: Bug
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.14.0
>
> Attachments: PIG-4003-1.patch
>
>
> Here is an example of stack trace printed to console output. Technically, this is a warning message and does not make the job fail. However, this is certainly not user-friendly.
> {code}
> 4/06/09 16:20:28 WARN pigstats.JobStats: unable to find the output file
> java.io.FileNotFoundException: File hdfs://10.61.10.185:9000/user/cheolsoop/prodhive.benchmark.unittest_vhs_bitrate_asn_sum_stg_test2 does not exist.
> at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654)
> at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
> at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:65)
> at org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:352)
> {code}
> The issue is that FileBasedOutputSizeReader mis-interprets hive table name as hdfs path.
> {code}
> @Override
> public boolean supports(POStore sto, Configuration conf) {
> return UriUtil.isHDFSFileOrLocalOrS3N(getLocationUri(sto), conf);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)