You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Georg Heiler <ge...@gmail.com> on 2016/09/30 15:24:18 UTC

event server - apache nifi & spark data set

Hi,

does the event server of PIO integrate with apache nifi?

In the examples you use the spark RDD api. Does PIO support sparks 2.0`s
datasets as well?

regards,
Georg

Re: event server - apache nifi & spark data set

Posted by Marcin Ziemiński <zi...@gmail.com>.
With Spark 2.0 Dataframes are a special case of Datasets, so every problem
applying to the latter applies also to the former.
PredictionIO is built around RDDs, but it doesn't stop you from using
Dataframes internally in your engine. By defining custom types in DASE
architecture of your engine, you should be able to utilize Dataframes
(Datasets with Spark 2.0 introduced by PR mentioned earlier).
However, trying to access PEventStore to collect your data you will get
RDDs, which you would have to convert to Dataframes if necessary.

niedz., 2.10.2016 o 11:12 użytkownik Georg Heiler <ge...@gmail.com>
napisał:

> Thanks.
> After looking around some more I realized that most engines are using RDD
> and not data frames.
> Is there a similar limitation as for  datasets?
>
> Regards,
> Georg
>
> Marcin Ziemiński <zi...@gmail.com> schrieb am Fr., 30. Sep. 2016 um
> 20:14 Uhr:
>
> So this is the mentioned PR:
> https://github.com/apache/incubator-predictionio/pull/295
>
> I am aware this is not enough, but this is a necessary step towards
> bringing desired changes.
>
> Best regards,
> Marcin
>
> pt., 30.09.2016 o 19:50 użytkownik Georg Heiler <ge...@gmail.com>
> napisał:
>
> Thanks.
> So a simple recompile for scala 2.11 and upgrade of the spark dependencies
> would not be enough.
>
> Would you mind sharing this pull request. I can't seem to find it via
> Google.
> Thanks again.
> Regards Georg
> Marcin Ziemiński <zi...@gmail.com> schrieb am Fr. 30. Sep. 2016 um
> 18:05:
>
> Hi Georg,
>
> There is currently no support for Apache NiFi integration in the project.
> I have personally been looking closer at NiFi recently and it seems like a
> good idea to glue it with PIO.
> PredictionIO is now in the stage of Apache incubation and the future
> releases after 0.10 will show more new functionality. If you have any ideas
> how it could look like, please feel free to share your conceptions. This is
> actually a very good moment to bring up such issues.
>
> As far as Datasets are concerned, PIO does not currently support Datasets
> in its API. There is currently a pull request with an update to Spark 2.0,
> so Datasets could be used internally in engines once this is merged, but
> the API doesn't reflect such changes now.
>
> Regards,
> Marcin
>
> pt., 30.09.2016 o 17:24 użytkownik Georg Heiler <ge...@gmail.com>
> napisał:
>
> Hi,
>
> does the event server of PIO integrate with apache nifi?
>
> In the examples you use the spark RDD api. Does PIO support sparks 2.0`s
> datasets as well?
>
> regards,
> Georg
>
>

Re: event server - apache nifi & spark data set

Posted by Georg Heiler <ge...@gmail.com>.
Thanks.
After looking around some more I realized that most engines are using RDD
and not data frames.
Is there a similar limitation as for  datasets?

Regards,
Georg

Marcin Ziemiński <zi...@gmail.com> schrieb am Fr., 30. Sep. 2016 um
20:14 Uhr:

> So this is the mentioned PR:
> https://github.com/apache/incubator-predictionio/pull/295
>
> I am aware this is not enough, but this is a necessary step towards
> bringing desired changes.
>
> Best regards,
> Marcin
>
> pt., 30.09.2016 o 19:50 użytkownik Georg Heiler <ge...@gmail.com>
> napisał:
>
> Thanks.
> So a simple recompile for scala 2.11 and upgrade of the spark dependencies
> would not be enough.
>
> Would you mind sharing this pull request. I can't seem to find it via
> Google.
> Thanks again.
> Regards Georg
> Marcin Ziemiński <zi...@gmail.com> schrieb am Fr. 30. Sep. 2016 um
> 18:05:
>
> Hi Georg,
>
> There is currently no support for Apache NiFi integration in the project.
> I have personally been looking closer at NiFi recently and it seems like a
> good idea to glue it with PIO.
> PredictionIO is now in the stage of Apache incubation and the future
> releases after 0.10 will show more new functionality. If you have any ideas
> how it could look like, please feel free to share your conceptions. This is
> actually a very good moment to bring up such issues.
>
> As far as Datasets are concerned, PIO does not currently support Datasets
> in its API. There is currently a pull request with an update to Spark 2.0,
> so Datasets could be used internally in engines once this is merged, but
> the API doesn't reflect such changes now.
>
> Regards,
> Marcin
>
> pt., 30.09.2016 o 17:24 użytkownik Georg Heiler <ge...@gmail.com>
> napisał:
>
> Hi,
>
> does the event server of PIO integrate with apache nifi?
>
> In the examples you use the spark RDD api. Does PIO support sparks 2.0`s
> datasets as well?
>
> regards,
> Georg
>
>

Re: event server - apache nifi & spark data set

Posted by Marcin Ziemiński <zi...@gmail.com>.
So this is the mentioned PR:
https://github.com/apache/incubator-predictionio/pull/295

I am aware this is not enough, but this is a necessary step towards
bringing desired changes.

Best regards,
Marcin

pt., 30.09.2016 o 19:50 użytkownik Georg Heiler <ge...@gmail.com>
napisał:

> Thanks.
> So a simple recompile for scala 2.11 and upgrade of the spark dependencies
> would not be enough.
>
> Would you mind sharing this pull request. I can't seem to find it via
> Google.
> Thanks again.
> Regards Georg
> Marcin Ziemiński <zi...@gmail.com> schrieb am Fr. 30. Sep. 2016 um
> 18:05:
>
>> Hi Georg,
>>
>> There is currently no support for Apache NiFi integration in the project.
>> I have personally been looking closer at NiFi recently and it seems like a
>> good idea to glue it with PIO.
>> PredictionIO is now in the stage of Apache incubation and the future
>> releases after 0.10 will show more new functionality. If you have any ideas
>> how it could look like, please feel free to share your conceptions. This is
>> actually a very good moment to bring up such issues.
>>
>> As far as Datasets are concerned, PIO does not currently support Datasets
>> in its API. There is currently a pull request with an update to Spark 2.0,
>> so Datasets could be used internally in engines once this is merged, but
>> the API doesn't reflect such changes now.
>>
>> Regards,
>> Marcin
>>
>> pt., 30.09.2016 o 17:24 użytkownik Georg Heiler <
>> georg.kf.heiler@gmail.com> napisał:
>>
>>> Hi,
>>>
>>> does the event server of PIO integrate with apache nifi?
>>>
>>> In the examples you use the spark RDD api. Does PIO support sparks 2.0`s
>>> datasets as well?
>>>
>>> regards,
>>> Georg
>>>
>>

Re: event server - apache nifi & spark data set

Posted by Georg Heiler <ge...@gmail.com>.
Thanks.
So a simple recompile for scala 2.11 and upgrade of the spark dependencies
would not be enough.

Would you mind sharing this pull request. I can't seem to find it via
Google.
Thanks again.
Regards Georg
Marcin Ziemiński <zi...@gmail.com> schrieb am Fr. 30. Sep. 2016 um 18:05:

> Hi Georg,
>
> There is currently no support for Apache NiFi integration in the project.
> I have personally been looking closer at NiFi recently and it seems like a
> good idea to glue it with PIO.
> PredictionIO is now in the stage of Apache incubation and the future
> releases after 0.10 will show more new functionality. If you have any ideas
> how it could look like, please feel free to share your conceptions. This is
> actually a very good moment to bring up such issues.
>
> As far as Datasets are concerned, PIO does not currently support Datasets
> in its API. There is currently a pull request with an update to Spark 2.0,
> so Datasets could be used internally in engines once this is merged, but
> the API doesn't reflect such changes now.
>
> Regards,
> Marcin
>
> pt., 30.09.2016 o 17:24 użytkownik Georg Heiler <ge...@gmail.com>
> napisał:
>
>> Hi,
>>
>> does the event server of PIO integrate with apache nifi?
>>
>> In the examples you use the spark RDD api. Does PIO support sparks 2.0`s
>> datasets as well?
>>
>> regards,
>> Georg
>>
>

Re: event server - apache nifi & spark data set

Posted by Marcin Ziemiński <zi...@gmail.com>.
Hi Georg,

There is currently no support for Apache NiFi integration in the project. I
have personally been looking closer at NiFi recently and it seems like a
good idea to glue it with PIO.
PredictionIO is now in the stage of Apache incubation and the future
releases after 0.10 will show more new functionality. If you have any ideas
how it could look like, please feel free to share your conceptions. This is
actually a very good moment to bring up such issues.

As far as Datasets are concerned, PIO does not currently support Datasets
in its API. There is currently a pull request with an update to Spark 2.0,
so Datasets could be used internally in engines once this is merged, but
the API doesn't reflect such changes now.

Regards,
Marcin

pt., 30.09.2016 o 17:24 użytkownik Georg Heiler <ge...@gmail.com>
napisał:

> Hi,
>
> does the event server of PIO integrate with apache nifi?
>
> In the examples you use the spark RDD api. Does PIO support sparks 2.0`s
> datasets as well?
>
> regards,
> Georg
>