You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "Sara Asher (JIRA)" <ji...@apache.org> on 2017/09/19 21:48:00 UTC

[jira] [Commented] (PIO-38) add Apache Parquet as a data source

    [ https://issues.apache.org/jira/browse/PIO-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172396#comment-16172396 ] 

Sara Asher commented on PIO-38:
-------------------------------

We can close this when PIO-71 is done

> add Apache Parquet as a data source
> -----------------------------------
>
>                 Key: PIO-38
>                 URL: https://issues.apache.org/jira/browse/PIO-38
>             Project: PredictionIO
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Wojciech Indyk
>              Labels: features
>
> Apache Parquet (https://parquet.apache.org/) is a columnar data store, native for Apache Spark and very well suited to storing batch data (as an input) for PredictionIO Engine.
> Parquet is very popular to archive clickstream, so it would enable to use PredictionIO without additional import of data (and duplication) to HBase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)