You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Mohit Sabharwal (JIRA)" <ji...@apache.org> on 2015/06/04 01:00:42 UTC

[jira] [Commented] (PIG-4585) Use newAPIHadoopRDD instead of newAPIHadoopFile

    [ https://issues.apache.org/jira/browse/PIG-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571795#comment-14571795 ] 

Mohit Sabharwal commented on PIG-4585:
--------------------------------------

FYI: [~kellyzly], [~kexianda], [~xuefuz]

Most (27 out of 33) tests in TestHBaseStorage tests pass

Remaining are failing due to UDFContext (thread local) not populated in
Spark Executor threads. Fixing this in a separate patch.

> Use newAPIHadoopRDD instead of newAPIHadoopFile
> -----------------------------------------------
>
>                 Key: PIG-4585
>                 URL: https://issues.apache.org/jira/browse/PIG-4585
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4585.patch
>
>
> LoadConverter currently uses SparkContext.newAPIHadoopFile which won't work for non-filesystem based input sources, like HBase.
> newAPIHadoopFile assumes a FileInputFormat and attempts to  [verify|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1065] this in the constructor, which fails for HBaseTableInputFormat (which is not a FileInputFormat)
> {code}
>   NewFileInputFormat.setInputPaths(job, path)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)