You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Kyle R Dunn (JIRA)" <ji...@apache.org> on 2016/07/18 20:33:20 UTC
[jira] [Commented] (HAWQ-29) Refactor HAWQ InputFormat to support
Spark/Scala
[ https://issues.apache.org/jira/browse/HAWQ-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383015#comment-15383015 ]
Kyle R Dunn commented on HAWQ-29:
---------------------------------
I'm planning to spend some time looking at this.
Here is the Spark API for Hadoop-based input formats:
http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#newAPIHadoopRDD%28org.apache.hadoop.conf.Configuration
And the Parquet Hadoop Input Format as a reference implementation:
https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetInputSplit.java
> Refactor HAWQ InputFormat to support Spark/Scala
> ------------------------------------------------
>
> Key: HAWQ-29
> URL: https://issues.apache.org/jira/browse/HAWQ-29
> Project: Apache HAWQ
> Issue Type: Wish
> Components: Storage
> Reporter: Lirong Jian
> Assignee: Lirong Jian
> Priority: Minor
> Labels: features
> Fix For: 2.0.1.0-incubating
>
>
> Currently the implementation of HAWQ InputFormat doesn't support Spark/Scala very well. We need to refactor the code to support that feature. More specifically, we need implement the serializable interface for some classes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)