You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Mohit Sabharwal (JIRA)" <ji...@apache.org> on 2015/06/04 01:00:42 UTC
[jira] [Commented] (PIG-4585) Use newAPIHadoopRDD instead of
newAPIHadoopFile
[ https://issues.apache.org/jira/browse/PIG-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571795#comment-14571795 ]
Mohit Sabharwal commented on PIG-4585:
--------------------------------------
FYI: [~kellyzly], [~kexianda], [~xuefuz]
Most (27 out of 33) tests in TestHBaseStorage tests pass
Remaining are failing due to UDFContext (thread local) not populated in
Spark Executor threads. Fixing this in a separate patch.
> Use newAPIHadoopRDD instead of newAPIHadoopFile
> -----------------------------------------------
>
> Key: PIG-4585
> URL: https://issues.apache.org/jira/browse/PIG-4585
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Affects Versions: spark-branch
> Reporter: Mohit Sabharwal
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4585.patch
>
>
> LoadConverter currently uses SparkContext.newAPIHadoopFile which won't work for non-filesystem based input sources, like HBase.
> newAPIHadoopFile assumes a FileInputFormat and attempts to [verify|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1065] this in the constructor, which fails for HBaseTableInputFormat (which is not a FileInputFormat)
> {code}
> NewFileInputFormat.setInputPaths(job, path)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)