You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Kyle R Dunn (JIRA)" <ji...@apache.org> on 2016/07/18 20:33:20 UTC

[jira] [Commented] (HAWQ-29) Refactor HAWQ InputFormat to support Spark/Scala

    [ https://issues.apache.org/jira/browse/HAWQ-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383015#comment-15383015 ] 

Kyle R Dunn commented on HAWQ-29:
---------------------------------

I'm planning to spend some time looking at this. 

Here is the Spark API for Hadoop-based input formats:
http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#newAPIHadoopRDD%28org.apache.hadoop.conf.Configuration

And the Parquet Hadoop Input Format as a reference implementation:
https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetInputSplit.java


> Refactor HAWQ InputFormat to support Spark/Scala
> ------------------------------------------------
>
>                 Key: HAWQ-29
>                 URL: https://issues.apache.org/jira/browse/HAWQ-29
>             Project: Apache HAWQ
>          Issue Type: Wish
>          Components: Storage
>            Reporter: Lirong Jian
>            Assignee: Lirong Jian
>            Priority: Minor
>              Labels: features
>             Fix For: 2.0.1.0-incubating
>
>
> Currently the implementation of HAWQ InputFormat doesn't support Spark/Scala very well. We need to refactor the code to support that feature. More specifically, we need implement the serializable interface for some classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)