You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "Marcin Ziemiński (JIRA)" <ji...@apache.org> on 2016/09/19 17:51:20 UTC

[jira] [Commented] (PIO-38) add Apache Parquet as a data source

    [ https://issues.apache.org/jira/browse/PIO-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504135#comment-15504135 ] 

Marcin Ziemiński commented on PIO-38:
-------------------------------------

[~woj_in] Could you elaborate more on what you would like to see in PredictionIO? Would it be simply another storage system for events using Apache Parquet or you mean some different kind of workflow in PredictionIO based on stream processing and making use of Parquet?

> add Apache Parquet as a data source
> -----------------------------------
>
>                 Key: PIO-38
>                 URL: https://issues.apache.org/jira/browse/PIO-38
>             Project: PredictionIO
>          Issue Type: New Feature
>            Reporter: Wojciech Indyk
>              Labels: features
>
> Apache Parquet (https://parquet.apache.org/) is a columnar data store, native for Apache Spark and very well suited to storing batch data (as an input) for PredictionIO Engine.
> Parquet is very popular to archive clickstream, so it would enable to use PredictionIO without additional import of data (and duplication) to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)