You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/10/27 10:28:00 UTC

Sharing RDDS across applications and users

There was a mention of using Zeppelin to share RDDs with many users. From
the notes on Zeppelin it appears that this is sharing UI and I am not sure
how easy it is going to be changing the result set with different users
modifying say sql queries.

There is also the idea of caching RDDs with something like Apache Ignite.
Has anyone really tried this. Will that work with multiple applications?

It looks feasible as RDDs are immutable and so are registered tempTables
etc.

Thanks


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks all for your advice.

As I understand in layman's term if I had two applications running
successfully where app 2 was dependent on app 1 I would finish app 1, store
the results in HDFS and the app 2 starts reading the results from HDFS and
work on it.

Using  Alluxio or others replaces HDFS with in-memory storage where app 2
can pick up app1 results from memory or even SSD and do the work.

Actually I am surprised why Spark has not incorporated this type of memory
as temporary storage.



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 11:28, Mich Talebzadeh <mi...@gmail.com>
wrote:

>
> There was a mention of using Zeppelin to share RDDs with many users. From
> the notes on Zeppelin it appears that this is sharing UI and I am not sure
> how easy it is going to be changing the result set with different users
> modifying say sql queries.
>
> There is also the idea of caching RDDs with something like Apache Ignite.
> Has anyone really tried this. Will that work with multiple applications?
>
> It looks feasible as RDDs are immutable and so are registered tempTables
> etc.
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks all for your advice.

As I understand in layman's term if I had two applications running
successfully where app 2 was dependent on app 1 I would finish app 1, store
the results in HDFS and the app 2 starts reading the results from HDFS and
work on it.

Using  Alluxio or others replaces HDFS with in-memory storage where app 2
can pick up app1 results from memory or even SSD and do the work.

Actually I am surprised why Spark has not incorporated this type of memory
as temporary storage.



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 11:28, Mich Talebzadeh <mi...@gmail.com>
wrote:

>
> There was a mention of using Zeppelin to share RDDs with many users. From
> the notes on Zeppelin it appears that this is sharing UI and I am not sure
> how easy it is going to be changing the result set with different users
> modifying say sql queries.
>
> There is also the idea of caching RDDs with something like Apache Ignite.
> Has anyone really tried this. Will that work with multiple applications?
>
> It looks feasible as RDDs are immutable and so are registered tempTables
> etc.
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
For this you will need to contribute...

Le 27 oct. 2016 1:35 PM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> so I assume Ignite will not work with Spark version >=2?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 12:27, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> some options:
>> - ignite for spark 1.5, can deep store on cassandra
>> - alluxio for all spark versions, can deep store on hdfs, gluster...
>>
>> ==> these are best for sharing between jobs
>>
>> - shared sparkcontext and fair scheduling, seems to be not thread safe
>> - spark jobserver and namedRDD, CRUD thread safe RDD sharing between
>> spark jobs
>> ==> these are best for sharing between users
>>
>> 2016-10-27 12:59 GMT+02:00 vincent gromakowski <
>> vincent.gromakowski@gmail.com>:
>>
>>> I would prefer sharing the spark context  and using FAIR scheduler for
>>> user concurrency
>>>
>>> Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>> a écrit :
>>>
>>>> thanks Vince.
>>>>
>>>> So Ignite uses some hash/in-memory indexing.
>>>>
>>>> The question is in practice is there much use case to use these two
>>>> fabrics for sharing RDDs.
>>>>
>>>> Remember all RDBMSs do this through shared memory.
>>>>
>>>> In layman's term if I have two independent spark-submit running, can
>>>> they share result set. For example the same tempTable etc?
>>>>
>>>> Cheers
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 27 October 2016 at 11:44, vincent gromakowski <
>>>> vincent.gromakowski@gmail.com> wrote:
>>>>
>>>>> Ignite works only with spark 1.5
>>>>> Ignite leverage indexes
>>>>> Alluxio provides tiering
>>>>> Alluxio easily integrates with underlying FS
>>>>>
>>>>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>>>> a écrit :
>>>>>
>>>>>> Thanks Chanh,
>>>>>>
>>>>>> Can it share RDDs.
>>>>>>
>>>>>> Personally I have not used either Alluxio or Ignite.
>>>>>>
>>>>>>
>>>>>>    1. Are there major differences between these two
>>>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>>>    have any experience you can kindly share
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Mich,
>>>>>>> Alluxio is the good option to go.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Chanh
>>>>>>>
>>>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>>>> sure how easy it is going to be changing the result set with different
>>>>>>> users modifying say sql queries.
>>>>>>>
>>>>>>> There is also the idea of caching RDDs with something like Apache
>>>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>>>> applications?
>>>>>>>
>>>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>>>> tempTables etc.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
For this you will need to contribute...

Le 27 oct. 2016 1:35 PM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> so I assume Ignite will not work with Spark version >=2?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 12:27, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> some options:
>> - ignite for spark 1.5, can deep store on cassandra
>> - alluxio for all spark versions, can deep store on hdfs, gluster...
>>
>> ==> these are best for sharing between jobs
>>
>> - shared sparkcontext and fair scheduling, seems to be not thread safe
>> - spark jobserver and namedRDD, CRUD thread safe RDD sharing between
>> spark jobs
>> ==> these are best for sharing between users
>>
>> 2016-10-27 12:59 GMT+02:00 vincent gromakowski <
>> vincent.gromakowski@gmail.com>:
>>
>>> I would prefer sharing the spark context  and using FAIR scheduler for
>>> user concurrency
>>>
>>> Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>> a écrit :
>>>
>>>> thanks Vince.
>>>>
>>>> So Ignite uses some hash/in-memory indexing.
>>>>
>>>> The question is in practice is there much use case to use these two
>>>> fabrics for sharing RDDs.
>>>>
>>>> Remember all RDBMSs do this through shared memory.
>>>>
>>>> In layman's term if I have two independent spark-submit running, can
>>>> they share result set. For example the same tempTable etc?
>>>>
>>>> Cheers
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 27 October 2016 at 11:44, vincent gromakowski <
>>>> vincent.gromakowski@gmail.com> wrote:
>>>>
>>>>> Ignite works only with spark 1.5
>>>>> Ignite leverage indexes
>>>>> Alluxio provides tiering
>>>>> Alluxio easily integrates with underlying FS
>>>>>
>>>>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>>>> a écrit :
>>>>>
>>>>>> Thanks Chanh,
>>>>>>
>>>>>> Can it share RDDs.
>>>>>>
>>>>>> Personally I have not used either Alluxio or Ignite.
>>>>>>
>>>>>>
>>>>>>    1. Are there major differences between these two
>>>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>>>    have any experience you can kindly share
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Mich,
>>>>>>> Alluxio is the good option to go.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Chanh
>>>>>>>
>>>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>>>> sure how easy it is going to be changing the result set with different
>>>>>>> users modifying say sql queries.
>>>>>>>
>>>>>>> There is also the idea of caching RDDs with something like Apache
>>>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>>>> applications?
>>>>>>>
>>>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>>>> tempTables etc.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
so I assume Ignite will not work with Spark version >=2?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 12:27, vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> some options:
> - ignite for spark 1.5, can deep store on cassandra
> - alluxio for all spark versions, can deep store on hdfs, gluster...
>
> ==> these are best for sharing between jobs
>
> - shared sparkcontext and fair scheduling, seems to be not thread safe
> - spark jobserver and namedRDD, CRUD thread safe RDD sharing between spark
> jobs
> ==> these are best for sharing between users
>
> 2016-10-27 12:59 GMT+02:00 vincent gromakowski <
> vincent.gromakowski@gmail.com>:
>
>> I would prefer sharing the spark context  and using FAIR scheduler for
>> user concurrency
>>
>> Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com>
>> a écrit :
>>
>>> thanks Vince.
>>>
>>> So Ignite uses some hash/in-memory indexing.
>>>
>>> The question is in practice is there much use case to use these two
>>> fabrics for sharing RDDs.
>>>
>>> Remember all RDBMSs do this through shared memory.
>>>
>>> In layman's term if I have two independent spark-submit running, can
>>> they share result set. For example the same tempTable etc?
>>>
>>> Cheers
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:44, vincent gromakowski <
>>> vincent.gromakowski@gmail.com> wrote:
>>>
>>>> Ignite works only with spark 1.5
>>>> Ignite leverage indexes
>>>> Alluxio provides tiering
>>>> Alluxio easily integrates with underlying FS
>>>>
>>>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>>> a écrit :
>>>>
>>>>> Thanks Chanh,
>>>>>
>>>>> Can it share RDDs.
>>>>>
>>>>> Personally I have not used either Alluxio or Ignite.
>>>>>
>>>>>
>>>>>    1. Are there major differences between these two
>>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>>    have any experience you can kindly share
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>>
>>>>>> Hi Mich,
>>>>>> Alluxio is the good option to go.
>>>>>>
>>>>>> Regards,
>>>>>> Chanh
>>>>>>
>>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>>> sure how easy it is going to be changing the result set with different
>>>>>> users modifying say sql queries.
>>>>>>
>>>>>> There is also the idea of caching RDDs with something like Apache
>>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>>> applications?
>>>>>>
>>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>>> tempTables etc.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
so I assume Ignite will not work with Spark version >=2?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 12:27, vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> some options:
> - ignite for spark 1.5, can deep store on cassandra
> - alluxio for all spark versions, can deep store on hdfs, gluster...
>
> ==> these are best for sharing between jobs
>
> - shared sparkcontext and fair scheduling, seems to be not thread safe
> - spark jobserver and namedRDD, CRUD thread safe RDD sharing between spark
> jobs
> ==> these are best for sharing between users
>
> 2016-10-27 12:59 GMT+02:00 vincent gromakowski <
> vincent.gromakowski@gmail.com>:
>
>> I would prefer sharing the spark context  and using FAIR scheduler for
>> user concurrency
>>
>> Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com>
>> a écrit :
>>
>>> thanks Vince.
>>>
>>> So Ignite uses some hash/in-memory indexing.
>>>
>>> The question is in practice is there much use case to use these two
>>> fabrics for sharing RDDs.
>>>
>>> Remember all RDBMSs do this through shared memory.
>>>
>>> In layman's term if I have two independent spark-submit running, can
>>> they share result set. For example the same tempTable etc?
>>>
>>> Cheers
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:44, vincent gromakowski <
>>> vincent.gromakowski@gmail.com> wrote:
>>>
>>>> Ignite works only with spark 1.5
>>>> Ignite leverage indexes
>>>> Alluxio provides tiering
>>>> Alluxio easily integrates with underlying FS
>>>>
>>>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>>> a écrit :
>>>>
>>>>> Thanks Chanh,
>>>>>
>>>>> Can it share RDDs.
>>>>>
>>>>> Personally I have not used either Alluxio or Ignite.
>>>>>
>>>>>
>>>>>    1. Are there major differences between these two
>>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>>    have any experience you can kindly share
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>>
>>>>>> Hi Mich,
>>>>>> Alluxio is the good option to go.
>>>>>>
>>>>>> Regards,
>>>>>> Chanh
>>>>>>
>>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>>> sure how easy it is going to be changing the result set with different
>>>>>> users modifying say sql queries.
>>>>>>
>>>>>> There is also the idea of caching RDDs with something like Apache
>>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>>> applications?
>>>>>>
>>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>>> tempTables etc.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
some options:
- ignite for spark 1.5, can deep store on cassandra
- alluxio for all spark versions, can deep store on hdfs, gluster...

==> these are best for sharing between jobs

- shared sparkcontext and fair scheduling, seems to be not thread safe
- spark jobserver and namedRDD, CRUD thread safe RDD sharing between spark
jobs
==> these are best for sharing between users

2016-10-27 12:59 GMT+02:00 vincent gromakowski <
vincent.gromakowski@gmail.com>:

> I would prefer sharing the spark context  and using FAIR scheduler for
> user concurrency
>
> Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com> a
> écrit :
>
>> thanks Vince.
>>
>> So Ignite uses some hash/in-memory indexing.
>>
>> The question is in practice is there much use case to use these two
>> fabrics for sharing RDDs.
>>
>> Remember all RDBMSs do this through shared memory.
>>
>> In layman's term if I have two independent spark-submit running, can they
>> share result set. For example the same tempTable etc?
>>
>> Cheers
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 October 2016 at 11:44, vincent gromakowski <
>> vincent.gromakowski@gmail.com> wrote:
>>
>>> Ignite works only with spark 1.5
>>> Ignite leverage indexes
>>> Alluxio provides tiering
>>> Alluxio easily integrates with underlying FS
>>>
>>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>> a écrit :
>>>
>>>> Thanks Chanh,
>>>>
>>>> Can it share RDDs.
>>>>
>>>> Personally I have not used either Alluxio or Ignite.
>>>>
>>>>
>>>>    1. Are there major differences between these two
>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>    have any experience you can kindly share
>>>>
>>>> Regards
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>
>>>>> Hi Mich,
>>>>> Alluxio is the good option to go.
>>>>>
>>>>> Regards,
>>>>> Chanh
>>>>>
>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>
>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>> sure how easy it is going to be changing the result set with different
>>>>> users modifying say sql queries.
>>>>>
>>>>> There is also the idea of caching RDDs with something like Apache
>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>> applications?
>>>>>
>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>> tempTables etc.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
some options:
- ignite for spark 1.5, can deep store on cassandra
- alluxio for all spark versions, can deep store on hdfs, gluster...

==> these are best for sharing between jobs

- shared sparkcontext and fair scheduling, seems to be not thread safe
- spark jobserver and namedRDD, CRUD thread safe RDD sharing between spark
jobs
==> these are best for sharing between users

2016-10-27 12:59 GMT+02:00 vincent gromakowski <
vincent.gromakowski@gmail.com>:

> I would prefer sharing the spark context  and using FAIR scheduler for
> user concurrency
>
> Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com> a
> écrit :
>
>> thanks Vince.
>>
>> So Ignite uses some hash/in-memory indexing.
>>
>> The question is in practice is there much use case to use these two
>> fabrics for sharing RDDs.
>>
>> Remember all RDBMSs do this through shared memory.
>>
>> In layman's term if I have two independent spark-submit running, can they
>> share result set. For example the same tempTable etc?
>>
>> Cheers
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 October 2016 at 11:44, vincent gromakowski <
>> vincent.gromakowski@gmail.com> wrote:
>>
>>> Ignite works only with spark 1.5
>>> Ignite leverage indexes
>>> Alluxio provides tiering
>>> Alluxio easily integrates with underlying FS
>>>
>>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>> a écrit :
>>>
>>>> Thanks Chanh,
>>>>
>>>> Can it share RDDs.
>>>>
>>>> Personally I have not used either Alluxio or Ignite.
>>>>
>>>>
>>>>    1. Are there major differences between these two
>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>    have any experience you can kindly share
>>>>
>>>> Regards
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>
>>>>> Hi Mich,
>>>>> Alluxio is the good option to go.
>>>>>
>>>>> Regards,
>>>>> Chanh
>>>>>
>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>
>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>> sure how easy it is going to be changing the result set with different
>>>>> users modifying say sql queries.
>>>>>
>>>>> There is also the idea of caching RDDs with something like Apache
>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>> applications?
>>>>>
>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>> tempTables etc.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
I would prefer sharing the spark context  and using FAIR scheduler for user
concurrency

Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> thanks Vince.
>
> So Ignite uses some hash/in-memory indexing.
>
> The question is in practice is there much use case to use these two
> fabrics for sharing RDDs.
>
> Remember all RDBMSs do this through shared memory.
>
> In layman's term if I have two independent spark-submit running, can they
> share result set. For example the same tempTable etc?
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 11:44, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> Ignite works only with spark 1.5
>> Ignite leverage indexes
>> Alluxio provides tiering
>> Alluxio easily integrates with underlying FS
>>
>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>> a écrit :
>>
>>> Thanks Chanh,
>>>
>>> Can it share RDDs.
>>>
>>> Personally I have not used either Alluxio or Ignite.
>>>
>>>
>>>    1. Are there major differences between these two
>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>    have any experience you can kindly share
>>>
>>> Regards
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>
>>>> Hi Mich,
>>>> Alluxio is the good option to go.
>>>>
>>>> Regards,
>>>> Chanh
>>>>
>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>> sure how easy it is going to be changing the result set with different
>>>> users modifying say sql queries.
>>>>
>>>> There is also the idea of caching RDDs with something like Apache
>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>> applications?
>>>>
>>>> It looks feasible as RDDs are immutable and so are registered
>>>> tempTables etc.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
I would prefer sharing the spark context  and using FAIR scheduler for user
concurrency

Le 27 oct. 2016 12:48 PM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> thanks Vince.
>
> So Ignite uses some hash/in-memory indexing.
>
> The question is in practice is there much use case to use these two
> fabrics for sharing RDDs.
>
> Remember all RDBMSs do this through shared memory.
>
> In layman's term if I have two independent spark-submit running, can they
> share result set. For example the same tempTable etc?
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 11:44, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> Ignite works only with spark 1.5
>> Ignite leverage indexes
>> Alluxio provides tiering
>> Alluxio easily integrates with underlying FS
>>
>> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com>
>> a écrit :
>>
>>> Thanks Chanh,
>>>
>>> Can it share RDDs.
>>>
>>> Personally I have not used either Alluxio or Ignite.
>>>
>>>
>>>    1. Are there major differences between these two
>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>    have any experience you can kindly share
>>>
>>> Regards
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>
>>>> Hi Mich,
>>>> Alluxio is the good option to go.
>>>>
>>>> Regards,
>>>> Chanh
>>>>
>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>> sure how easy it is going to be changing the result set with different
>>>> users modifying say sql queries.
>>>>
>>>> There is also the idea of caching RDDs with something like Apache
>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>> applications?
>>>>
>>>> It looks feasible as RDDs are immutable and so are registered
>>>> tempTables etc.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks Vince.

So Ignite uses some hash/in-memory indexing.

The question is in practice is there much use case to use these two fabrics
for sharing RDDs.

Remember all RDBMSs do this through shared memory.

In layman's term if I have two independent spark-submit running, can they
share result set. For example the same tempTable etc?

Cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 11:44, vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> Ignite works only with spark 1.5
> Ignite leverage indexes
> Alluxio provides tiering
> Alluxio easily integrates with underlying FS
>
> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com> a
> écrit :
>
>> Thanks Chanh,
>>
>> Can it share RDDs.
>>
>> Personally I have not used either Alluxio or Ignite.
>>
>>
>>    1. Are there major differences between these two
>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>    have any experience you can kindly share
>>
>> Regards
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>
>>> Hi Mich,
>>> Alluxio is the good option to go.
>>>
>>> Regards,
>>> Chanh
>>>
>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>>
>>> There was a mention of using Zeppelin to share RDDs with many users.
>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>> sure how easy it is going to be changing the result set with different
>>> users modifying say sql queries.
>>>
>>> There is also the idea of caching RDDs with something like Apache
>>> Ignite. Has anyone really tried this. Will that work with multiple
>>> applications?
>>>
>>> It looks feasible as RDDs are immutable and so are registered tempTables
>>> etc.
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>>
>>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks Vince.

So Ignite uses some hash/in-memory indexing.

The question is in practice is there much use case to use these two fabrics
for sharing RDDs.

Remember all RDBMSs do this through shared memory.

In layman's term if I have two independent spark-submit running, can they
share result set. For example the same tempTable etc?

Cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 11:44, vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> Ignite works only with spark 1.5
> Ignite leverage indexes
> Alluxio provides tiering
> Alluxio easily integrates with underlying FS
>
> Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com> a
> écrit :
>
>> Thanks Chanh,
>>
>> Can it share RDDs.
>>
>> Personally I have not used either Alluxio or Ignite.
>>
>>
>>    1. Are there major differences between these two
>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>    have any experience you can kindly share
>>
>> Regards
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>
>>> Hi Mich,
>>> Alluxio is the good option to go.
>>>
>>> Regards,
>>> Chanh
>>>
>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>>
>>> There was a mention of using Zeppelin to share RDDs with many users.
>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>> sure how easy it is going to be changing the result set with different
>>> users modifying say sql queries.
>>>
>>> There is also the idea of caching RDDs with something like Apache
>>> Ignite. Has anyone really tried this. Will that work with multiple
>>> applications?
>>>
>>> It looks feasible as RDDs are immutable and so are registered tempTables
>>> etc.
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>>
>>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
Ignite works only with spark 1.5
Ignite leverage indexes
Alluxio provides tiering
Alluxio easily integrates with underlying FS

Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> Thanks Chanh,
>
> Can it share RDDs.
>
> Personally I have not used either Alluxio or Ignite.
>
>
>    1. Are there major differences between these two
>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you have
>    any experience you can kindly share
>
> Regards
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi Mich,
>> Alluxio is the good option to go.
>>
>> Regards,
>> Chanh
>>
>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>>
>> There was a mention of using Zeppelin to share RDDs with many users. From
>> the notes on Zeppelin it appears that this is sharing UI and I am not sure
>> how easy it is going to be changing the result set with different users
>> modifying say sql queries.
>>
>> There is also the idea of caching RDDs with something like Apache Ignite.
>> Has anyone really tried this. Will that work with multiple applications?
>>
>> It looks feasible as RDDs are immutable and so are registered tempTables
>> etc.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>

Re: Sharing RDDS across applications and users

Posted by Chanh Le <gi...@gmail.com>.
Hi Mich,
I only tried Alluxio so I can’t give you a comparison.
In my experience, I use Alluxio for the big data set (50GB - 100GB) which is the input of the pipelines jobs so you can reuse the result from previous job.


> On Oct 27, 2016, at 5:39 PM, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Thanks Chanh,
> 
> Can it share RDDs.
> 
> Personally I have not used either Alluxio or Ignite.
> 
> Are there major differences between these two
> Have you tried Alluxio for sharing Spark RDDs and if so do you have any experience you can kindly share
> Regards
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>  
> 
> On 27 October 2016 at 11:29, Chanh Le <giaosudau@gmail.com <ma...@gmail.com>> wrote:
> Hi Mich,
> Alluxio is the good option to go. 
> 
> Regards,
> Chanh
> 
>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> There was a mention of using Zeppelin to share RDDs with many users. From the notes on Zeppelin it appears that this is sharing UI and I am not sure how easy it is going to be changing the result set with different users modifying say sql queries.
>> 
>> There is also the idea of caching RDDs with something like Apache Ignite. Has anyone really tried this. Will that work with multiple applications?
>> 
>> It looks feasible as RDDs are immutable and so are registered tempTables etc.
>> 
>> Thanks
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>>  
> 
> 


Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
Bad idea. No caching, cluster over consumption...
Have a look on instantiating a custom thriftserver on temp tables with
fair  scheduler to allow concurrent SQL requests. It's not a public API but
you can find some examples.

Le 28 oct. 2016 11:12 AM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> Hi,
>
> I think tempTable is private to the session that creates it. In Hive temp
> tables created by "CREATE TEMPORARY TABLE" are all private to the session.
> Spark is no different.
>
> The alternative may be everyone creates tempTable from the same DF?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 October 2016 at 10:03, Chanh Le <gi...@gmail.com> wrote:
>
>> Can you elaborate on how to implement "shared sparkcontext and fair
>> scheduling" option?
>>
>>
>> It just reuse 1 Spark Context by not letting it stop when the application
>> had done. Should check: livy, spark-jobserver
>> FAIR https://spark.apache.org/docs/1.2.0/job-scheduling.html just how
>> you scheduler your job in the pool but FAIR help you run job in parallel vs
>> FIFO (default) 1 job at the time.
>>
>>
>> My approach was to use  sparkSession.getOrCreate() method and register
>> temp table in one application. However, I was not able to access this
>> tempTable in another application.
>>
>>
>> Store metadata in Hive may help but I am not sure about this.
>> I use Spark Thrift Server create table on that then let Zeppelin query
>> from that.
>>
>> Regards,
>> Chanh
>>
>>
>>
>>
>>
>> On Oct 27, 2016, at 9:01 PM, Victor Shafran <vi...@equalum.io>
>> wrote:
>>
>> Hi Vincent,
>> Can you elaborate on how to implement "shared sparkcontext and fair
>> scheduling" option?
>>
>> My approach was to use  sparkSession.getOrCreate() method and register
>> temp table in one application. However, I was not able to access this
>> tempTable in another application.
>> You help is highly appreciated
>> Victor
>>
>> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <ge...@gmail.com> wrote:
>>
>>> Hi Mich,
>>>
>>> Yes, Alluxio is commonly used to cache and share Spark RDDs and
>>> DataFrames among different applications and contexts. The data typically
>>> stays in memory, but with Alluxio's tiered storage, the "colder" data can
>>> be evicted out to other medium, like SSDs and HDDs. Here is a blog post
>>> discussing Spark RDDs and Alluxio: https://www.alluxio.c
>>> om/blog/effective-spark-rdds-with-alluxio
>>>
>>> Also, Alluxio also has the concept of an "Under filesystem", which can
>>> help you access your existing data across different storage systems. Here
>>> is more information about the unified namespace abilities:
>>> http://www.alluxio.org/docs/master/en/Unified-and
>>> -Transparent-Namespace.html
>>>
>>> Hope that helps,
>>> Gene
>>>
>>> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Thanks Chanh,
>>>>
>>>> Can it share RDDs.
>>>>
>>>> Personally I have not used either Alluxio or Ignite.
>>>>
>>>>
>>>>    1. Are there major differences between these two
>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>    have any experience you can kindly share
>>>>
>>>> Regards
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>
>>>>> Hi Mich,
>>>>> Alluxio is the good option to go.
>>>>>
>>>>> Regards,
>>>>> Chanh
>>>>>
>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>
>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>> sure how easy it is going to be changing the result set with different
>>>>> users modifying say sql queries.
>>>>>
>>>>> There is also the idea of caching RDDs with something like Apache
>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>> applications?
>>>>>
>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>> tempTables etc.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Victor Shafran
>>
>> VP R&D| Equalum
>>
>> Mobile: +972-523854883 | Email: victor.shafran@equalum.io
>>
>>
>>
>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
Bad idea. No caching, cluster over consumption...
Have a look on instantiating a custom thriftserver on temp tables with
fair  scheduler to allow concurrent SQL requests. It's not a public API but
you can find some examples.

Le 28 oct. 2016 11:12 AM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> Hi,
>
> I think tempTable is private to the session that creates it. In Hive temp
> tables created by "CREATE TEMPORARY TABLE" are all private to the session.
> Spark is no different.
>
> The alternative may be everyone creates tempTable from the same DF?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 October 2016 at 10:03, Chanh Le <gi...@gmail.com> wrote:
>
>> Can you elaborate on how to implement "shared sparkcontext and fair
>> scheduling" option?
>>
>>
>> It just reuse 1 Spark Context by not letting it stop when the application
>> had done. Should check: livy, spark-jobserver
>> FAIR https://spark.apache.org/docs/1.2.0/job-scheduling.html just how
>> you scheduler your job in the pool but FAIR help you run job in parallel vs
>> FIFO (default) 1 job at the time.
>>
>>
>> My approach was to use  sparkSession.getOrCreate() method and register
>> temp table in one application. However, I was not able to access this
>> tempTable in another application.
>>
>>
>> Store metadata in Hive may help but I am not sure about this.
>> I use Spark Thrift Server create table on that then let Zeppelin query
>> from that.
>>
>> Regards,
>> Chanh
>>
>>
>>
>>
>>
>> On Oct 27, 2016, at 9:01 PM, Victor Shafran <vi...@equalum.io>
>> wrote:
>>
>> Hi Vincent,
>> Can you elaborate on how to implement "shared sparkcontext and fair
>> scheduling" option?
>>
>> My approach was to use  sparkSession.getOrCreate() method and register
>> temp table in one application. However, I was not able to access this
>> tempTable in another application.
>> You help is highly appreciated
>> Victor
>>
>> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <ge...@gmail.com> wrote:
>>
>>> Hi Mich,
>>>
>>> Yes, Alluxio is commonly used to cache and share Spark RDDs and
>>> DataFrames among different applications and contexts. The data typically
>>> stays in memory, but with Alluxio's tiered storage, the "colder" data can
>>> be evicted out to other medium, like SSDs and HDDs. Here is a blog post
>>> discussing Spark RDDs and Alluxio: https://www.alluxio.c
>>> om/blog/effective-spark-rdds-with-alluxio
>>>
>>> Also, Alluxio also has the concept of an "Under filesystem", which can
>>> help you access your existing data across different storage systems. Here
>>> is more information about the unified namespace abilities:
>>> http://www.alluxio.org/docs/master/en/Unified-and
>>> -Transparent-Namespace.html
>>>
>>> Hope that helps,
>>> Gene
>>>
>>> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Thanks Chanh,
>>>>
>>>> Can it share RDDs.
>>>>
>>>> Personally I have not used either Alluxio or Ignite.
>>>>
>>>>
>>>>    1. Are there major differences between these two
>>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>>    have any experience you can kindly share
>>>>
>>>> Regards
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>>
>>>>> Hi Mich,
>>>>> Alluxio is the good option to go.
>>>>>
>>>>> Regards,
>>>>> Chanh
>>>>>
>>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>
>>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>>> sure how easy it is going to be changing the result set with different
>>>>> users modifying say sql queries.
>>>>>
>>>>> There is also the idea of caching RDDs with something like Apache
>>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>>> applications?
>>>>>
>>>>> It looks feasible as RDDs are immutable and so are registered
>>>>> tempTables etc.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Victor Shafran
>>
>> VP R&D| Equalum
>>
>> Mobile: +972-523854883 | Email: victor.shafran@equalum.io
>>
>>
>>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

I think tempTable is private to the session that creates it. In Hive temp
tables created by "CREATE TEMPORARY TABLE" are all private to the session.
Spark is no different.

The alternative may be everyone creates tempTable from the same DF?

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 10:03, Chanh Le <gi...@gmail.com> wrote:

> Can you elaborate on how to implement "shared sparkcontext and fair
> scheduling" option?
>
>
> It just reuse 1 Spark Context by not letting it stop when the application
> had done. Should check: livy, spark-jobserver
> FAIR https://spark.apache.org/docs/1.2.0/job-scheduling.html just how you
> scheduler your job in the pool but FAIR help you run job in parallel vs
> FIFO (default) 1 job at the time.
>
>
> My approach was to use  sparkSession.getOrCreate() method and register
> temp table in one application. However, I was not able to access this
> tempTable in another application.
>
>
> Store metadata in Hive may help but I am not sure about this.
> I use Spark Thrift Server create table on that then let Zeppelin query
> from that.
>
> Regards,
> Chanh
>
>
>
>
>
> On Oct 27, 2016, at 9:01 PM, Victor Shafran <vi...@equalum.io>
> wrote:
>
> Hi Vincent,
> Can you elaborate on how to implement "shared sparkcontext and fair
> scheduling" option?
>
> My approach was to use  sparkSession.getOrCreate() method and register
> temp table in one application. However, I was not able to access this
> tempTable in another application.
> You help is highly appreciated
> Victor
>
> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <ge...@gmail.com> wrote:
>
>> Hi Mich,
>>
>> Yes, Alluxio is commonly used to cache and share Spark RDDs and
>> DataFrames among different applications and contexts. The data typically
>> stays in memory, but with Alluxio's tiered storage, the "colder" data can
>> be evicted out to other medium, like SSDs and HDDs. Here is a blog post
>> discussing Spark RDDs and Alluxio: https://www.alluxio.c
>> om/blog/effective-spark-rdds-with-alluxio
>>
>> Also, Alluxio also has the concept of an "Under filesystem", which can
>> help you access your existing data across different storage systems. Here
>> is more information about the unified namespace abilities:
>> http://www.alluxio.org/docs/master/en/Unified-and
>> -Transparent-Namespace.html
>>
>> Hope that helps,
>> Gene
>>
>> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks Chanh,
>>>
>>> Can it share RDDs.
>>>
>>> Personally I have not used either Alluxio or Ignite.
>>>
>>>
>>>    1. Are there major differences between these two
>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>    have any experience you can kindly share
>>>
>>> Regards
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>
>>>> Hi Mich,
>>>> Alluxio is the good option to go.
>>>>
>>>> Regards,
>>>> Chanh
>>>>
>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>> sure how easy it is going to be changing the result set with different
>>>> users modifying say sql queries.
>>>>
>>>> There is also the idea of caching RDDs with something like Apache
>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>> applications?
>>>>
>>>> It looks feasible as RDDs are immutable and so are registered
>>>> tempTables etc.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
>
> Victor Shafran
>
> VP R&D| Equalum
>
> Mobile: +972-523854883 | Email: victor.shafran@equalum.io
>
>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

I think tempTable is private to the session that creates it. In Hive temp
tables created by "CREATE TEMPORARY TABLE" are all private to the session.
Spark is no different.

The alternative may be everyone creates tempTable from the same DF?

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 10:03, Chanh Le <gi...@gmail.com> wrote:

> Can you elaborate on how to implement "shared sparkcontext and fair
> scheduling" option?
>
>
> It just reuse 1 Spark Context by not letting it stop when the application
> had done. Should check: livy, spark-jobserver
> FAIR https://spark.apache.org/docs/1.2.0/job-scheduling.html just how you
> scheduler your job in the pool but FAIR help you run job in parallel vs
> FIFO (default) 1 job at the time.
>
>
> My approach was to use  sparkSession.getOrCreate() method and register
> temp table in one application. However, I was not able to access this
> tempTable in another application.
>
>
> Store metadata in Hive may help but I am not sure about this.
> I use Spark Thrift Server create table on that then let Zeppelin query
> from that.
>
> Regards,
> Chanh
>
>
>
>
>
> On Oct 27, 2016, at 9:01 PM, Victor Shafran <vi...@equalum.io>
> wrote:
>
> Hi Vincent,
> Can you elaborate on how to implement "shared sparkcontext and fair
> scheduling" option?
>
> My approach was to use  sparkSession.getOrCreate() method and register
> temp table in one application. However, I was not able to access this
> tempTable in another application.
> You help is highly appreciated
> Victor
>
> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <ge...@gmail.com> wrote:
>
>> Hi Mich,
>>
>> Yes, Alluxio is commonly used to cache and share Spark RDDs and
>> DataFrames among different applications and contexts. The data typically
>> stays in memory, but with Alluxio's tiered storage, the "colder" data can
>> be evicted out to other medium, like SSDs and HDDs. Here is a blog post
>> discussing Spark RDDs and Alluxio: https://www.alluxio.c
>> om/blog/effective-spark-rdds-with-alluxio
>>
>> Also, Alluxio also has the concept of an "Under filesystem", which can
>> help you access your existing data across different storage systems. Here
>> is more information about the unified namespace abilities:
>> http://www.alluxio.org/docs/master/en/Unified-and
>> -Transparent-Namespace.html
>>
>> Hope that helps,
>> Gene
>>
>> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks Chanh,
>>>
>>> Can it share RDDs.
>>>
>>> Personally I have not used either Alluxio or Ignite.
>>>
>>>
>>>    1. Are there major differences between these two
>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>    have any experience you can kindly share
>>>
>>> Regards
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>
>>>> Hi Mich,
>>>> Alluxio is the good option to go.
>>>>
>>>> Regards,
>>>> Chanh
>>>>
>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>> sure how easy it is going to be changing the result set with different
>>>> users modifying say sql queries.
>>>>
>>>> There is also the idea of caching RDDs with something like Apache
>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>> applications?
>>>>
>>>> It looks feasible as RDDs are immutable and so are registered
>>>> tempTables etc.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
>
> Victor Shafran
>
> VP R&D| Equalum
>
> Mobile: +972-523854883 | Email: victor.shafran@equalum.io
>
>
>

Re: Sharing RDDS across applications and users

Posted by Chanh Le <gi...@gmail.com>.
> Can you elaborate on how to implement "shared sparkcontext and fair scheduling" option? 


It just reuse 1 Spark Context by not letting it stop when the application had done. Should check: livy, spark-jobserver
FAIR https://spark.apache.org/docs/1.2.0/job-scheduling.html <https://spark.apache.org/docs/1.2.0/job-scheduling.html> just how you scheduler your job in the pool but FAIR help you run job in parallel vs FIFO (default) 1 job at the time.


> My approach was to use  sparkSession.getOrCreate() method and register temp table in one application. However, I was not able to access this tempTable in another application. 


Store metadata in Hive may help but I am not sure about this.
I use Spark Thrift Server create table on that then let Zeppelin query from that.

Regards,
Chanh





> On Oct 27, 2016, at 9:01 PM, Victor Shafran <vi...@equalum.io> wrote:
> 
> Hi Vincent,
> Can you elaborate on how to implement "shared sparkcontext and fair scheduling" option? 
> 
> My approach was to use  sparkSession.getOrCreate() method and register temp table in one application. However, I was not able to access this tempTable in another application. 
> You help is highly appreciated 
> Victor
> 
> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <gene.pang@gmail.com <ma...@gmail.com>> wrote:
> Hi Mich,
> 
> Yes, Alluxio is commonly used to cache and share Spark RDDs and DataFrames among different applications and contexts. The data typically stays in memory, but with Alluxio's tiered storage, the "colder" data can be evicted out to other medium, like SSDs and HDDs. Here is a blog post discussing Spark RDDs and Alluxio: https://www.alluxio.com/blog/effective-spark-rdds-with-alluxio <https://www.alluxio.com/blog/effective-spark-rdds-with-alluxio>
> 
> Also, Alluxio also has the concept of an "Under filesystem", which can help you access your existing data across different storage systems. Here is more information about the unified namespace abilities: http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html <http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html>
> 
> Hope that helps,
> Gene
> 
> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
> Thanks Chanh,
> 
> Can it share RDDs.
> 
> Personally I have not used either Alluxio or Ignite.
> 
> Are there major differences between these two
> Have you tried Alluxio for sharing Spark RDDs and if so do you have any experience you can kindly share
> Regards
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>  
> 
> On 27 October 2016 at 11:29, Chanh Le <giaosudau@gmail.com <ma...@gmail.com>> wrote:
> Hi Mich,
> Alluxio is the good option to go. 
> 
> Regards,
> Chanh
> 
>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> There was a mention of using Zeppelin to share RDDs with many users. From the notes on Zeppelin it appears that this is sharing UI and I am not sure how easy it is going to be changing the result set with different users modifying say sql queries.
>> 
>> There is also the idea of caching RDDs with something like Apache Ignite. Has anyone really tried this. Will that work with multiple applications?
>> 
>> It looks feasible as RDDs are immutable and so are registered tempTables etc.
>> 
>> Thanks
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>>  
> 
> 
> 
> 
> 
> 
> -- 
> Victor Shafran
> 
> VP R&D| Equalum
> 
> 
> Mobile: +972-523854883 <tel:%2B972-523854883> | Email: victor.shafran@equalum.io <ma...@equalum.io>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
Hi
Just point all users on the same app with a common spark context.
For instance akka http receives queries from user and launch concurrent
spark SQL queries in different actor thread. The only prerequsite is to
launch the different jobs in different threads (like with actors).
Be carefull it's not CRUD if one of the job modifies dataset, it's OK for
read only.

Le 27 oct. 2016 4:02 PM, "Victor Shafran" <vi...@equalum.io> a
écrit :

> Hi Vincent,
> Can you elaborate on how to implement "shared sparkcontext and fair
> scheduling" option?
>
> My approach was to use  sparkSession.getOrCreate() method and register
> temp table in one application. However, I was not able to access this
> tempTable in another application.
> You help is highly appreciated
> Victor
>
> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <ge...@gmail.com> wrote:
>
>> Hi Mich,
>>
>> Yes, Alluxio is commonly used to cache and share Spark RDDs and
>> DataFrames among different applications and contexts. The data typically
>> stays in memory, but with Alluxio's tiered storage, the "colder" data can
>> be evicted out to other medium, like SSDs and HDDs. Here is a blog post
>> discussing Spark RDDs and Alluxio: https://www.alluxio.c
>> om/blog/effective-spark-rdds-with-alluxio
>>
>> Also, Alluxio also has the concept of an "Under filesystem", which can
>> help you access your existing data across different storage systems. Here
>> is more information about the unified namespace abilities:
>> http://www.alluxio.org/docs/master/en/Unified-and
>> -Transparent-Namespace.html
>>
>> Hope that helps,
>> Gene
>>
>> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks Chanh,
>>>
>>> Can it share RDDs.
>>>
>>> Personally I have not used either Alluxio or Ignite.
>>>
>>>
>>>    1. Are there major differences between these two
>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>    have any experience you can kindly share
>>>
>>> Regards
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>
>>>> Hi Mich,
>>>> Alluxio is the good option to go.
>>>>
>>>> Regards,
>>>> Chanh
>>>>
>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>> sure how easy it is going to be changing the result set with different
>>>> users modifying say sql queries.
>>>>
>>>> There is also the idea of caching RDDs with something like Apache
>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>> applications?
>>>>
>>>> It looks feasible as RDDs are immutable and so are registered
>>>> tempTables etc.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
>
> Victor Shafran
>
> VP R&D| Equalum
>
> Mobile: +972-523854883 | Email: victor.shafran@equalum.io
>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
Hi
Just point all users on the same app with a common spark context.
For instance akka http receives queries from user and launch concurrent
spark SQL queries in different actor thread. The only prerequsite is to
launch the different jobs in different threads (like with actors).
Be carefull it's not CRUD if one of the job modifies dataset, it's OK for
read only.

Le 27 oct. 2016 4:02 PM, "Victor Shafran" <vi...@equalum.io> a
écrit :

> Hi Vincent,
> Can you elaborate on how to implement "shared sparkcontext and fair
> scheduling" option?
>
> My approach was to use  sparkSession.getOrCreate() method and register
> temp table in one application. However, I was not able to access this
> tempTable in another application.
> You help is highly appreciated
> Victor
>
> On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <ge...@gmail.com> wrote:
>
>> Hi Mich,
>>
>> Yes, Alluxio is commonly used to cache and share Spark RDDs and
>> DataFrames among different applications and contexts. The data typically
>> stays in memory, but with Alluxio's tiered storage, the "colder" data can
>> be evicted out to other medium, like SSDs and HDDs. Here is a blog post
>> discussing Spark RDDs and Alluxio: https://www.alluxio.c
>> om/blog/effective-spark-rdds-with-alluxio
>>
>> Also, Alluxio also has the concept of an "Under filesystem", which can
>> help you access your existing data across different storage systems. Here
>> is more information about the unified namespace abilities:
>> http://www.alluxio.org/docs/master/en/Unified-and
>> -Transparent-Namespace.html
>>
>> Hope that helps,
>> Gene
>>
>> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks Chanh,
>>>
>>> Can it share RDDs.
>>>
>>> Personally I have not used either Alluxio or Ignite.
>>>
>>>
>>>    1. Are there major differences between these two
>>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>>    have any experience you can kindly share
>>>
>>> Regards
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>>
>>>> Hi Mich,
>>>> Alluxio is the good option to go.
>>>>
>>>> Regards,
>>>> Chanh
>>>>
>>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> There was a mention of using Zeppelin to share RDDs with many users.
>>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>>> sure how easy it is going to be changing the result set with different
>>>> users modifying say sql queries.
>>>>
>>>> There is also the idea of caching RDDs with something like Apache
>>>> Ignite. Has anyone really tried this. Will that work with multiple
>>>> applications?
>>>>
>>>> It looks feasible as RDDs are immutable and so are registered
>>>> tempTables etc.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
>
> Victor Shafran
>
> VP R&D| Equalum
>
> Mobile: +972-523854883 | Email: victor.shafran@equalum.io
>

Re: Sharing RDDS across applications and users

Posted by Victor Shafran <vi...@equalum.io>.
Hi Vincent,
Can you elaborate on how to implement "shared sparkcontext and fair
scheduling" option?

My approach was to use  sparkSession.getOrCreate() method and register temp
table in one application. However, I was not able to access this tempTable
in another application.
You help is highly appreciated
Victor

On Thu, Oct 27, 2016 at 4:31 PM, Gene Pang <ge...@gmail.com> wrote:

> Hi Mich,
>
> Yes, Alluxio is commonly used to cache and share Spark RDDs and DataFrames
> among different applications and contexts. The data typically stays in
> memory, but with Alluxio's tiered storage, the "colder" data can be evicted
> out to other medium, like SSDs and HDDs. Here is a blog post discussing
> Spark RDDs and Alluxio: https://www.alluxio.com/blog/effective-spark-rdds-
> with-alluxio
>
> Also, Alluxio also has the concept of an "Under filesystem", which can
> help you access your existing data across different storage systems. Here
> is more information about the unified namespace abilities:
> http://www.alluxio.org/docs/master/en/Unified-
> and-Transparent-Namespace.html
>
> Hope that helps,
> Gene
>
> On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Thanks Chanh,
>>
>> Can it share RDDs.
>>
>> Personally I have not used either Alluxio or Ignite.
>>
>>
>>    1. Are there major differences between these two
>>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you
>>    have any experience you can kindly share
>>
>> Regards
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>>
>>> Hi Mich,
>>> Alluxio is the good option to go.
>>>
>>> Regards,
>>> Chanh
>>>
>>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>>
>>> There was a mention of using Zeppelin to share RDDs with many users.
>>> From the notes on Zeppelin it appears that this is sharing UI and I am not
>>> sure how easy it is going to be changing the result set with different
>>> users modifying say sql queries.
>>>
>>> There is also the idea of caching RDDs with something like Apache
>>> Ignite. Has anyone really tried this. Will that work with multiple
>>> applications?
>>>
>>> It looks feasible as RDDs are immutable and so are registered tempTables
>>> etc.
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>>
>>
>


-- 

Victor Shafran

VP R&D| Equalum

Mobile: +972-523854883 | Email: victor.shafran@equalum.io

Re: Sharing RDDS across applications and users

Posted by Gene Pang <ge...@gmail.com>.
Hi Mich,

Yes, Alluxio is commonly used to cache and share Spark RDDs and DataFrames
among different applications and contexts. The data typically stays in
memory, but with Alluxio's tiered storage, the "colder" data can be evicted
out to other medium, like SSDs and HDDs. Here is a blog post discussing
Spark RDDs and Alluxio:
https://www.alluxio.com/blog/effective-spark-rdds-with-alluxio

Also, Alluxio also has the concept of an "Under filesystem", which can help
you access your existing data across different storage systems. Here is
more information about the unified namespace abilities:
http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html

Hope that helps,
Gene

On Thu, Oct 27, 2016 at 3:39 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Thanks Chanh,
>
> Can it share RDDs.
>
> Personally I have not used either Alluxio or Ignite.
>
>
>    1. Are there major differences between these two
>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you have
>    any experience you can kindly share
>
> Regards
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi Mich,
>> Alluxio is the good option to go.
>>
>> Regards,
>> Chanh
>>
>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>>
>> There was a mention of using Zeppelin to share RDDs with many users. From
>> the notes on Zeppelin it appears that this is sharing UI and I am not sure
>> how easy it is going to be changing the result set with different users
>> modifying say sql queries.
>>
>> There is also the idea of caching RDDs with something like Apache Ignite.
>> Has anyone really tried this. Will that work with multiple applications?
>>
>> It looks feasible as RDDs are immutable and so are registered tempTables
>> etc.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>

Re: Sharing RDDS across applications and users

Posted by vincent gromakowski <vi...@gmail.com>.
Ignite works only with spark 1.5
Ignite leverage indexes
Alluxio provides tiering
Alluxio easily integrates with underlying FS

Le 27 oct. 2016 12:39 PM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> Thanks Chanh,
>
> Can it share RDDs.
>
> Personally I have not used either Alluxio or Ignite.
>
>
>    1. Are there major differences between these two
>    2. Have you tried Alluxio for sharing Spark RDDs and if so do you have
>    any experience you can kindly share
>
> Regards
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi Mich,
>> Alluxio is the good option to go.
>>
>> Regards,
>> Chanh
>>
>> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>>
>> There was a mention of using Zeppelin to share RDDs with many users. From
>> the notes on Zeppelin it appears that this is sharing UI and I am not sure
>> how easy it is going to be changing the result set with different users
>> modifying say sql queries.
>>
>> There is also the idea of caching RDDs with something like Apache Ignite.
>> Has anyone really tried this. Will that work with multiple applications?
>>
>> It looks feasible as RDDs are immutable and so are registered tempTables
>> etc.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks Chanh,

Can it share RDDs.

Personally I have not used either Alluxio or Ignite.


   1. Are there major differences between these two
   2. Have you tried Alluxio for sharing Spark RDDs and if so do you have
   any experience you can kindly share

Regards


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:

> Hi Mich,
> Alluxio is the good option to go.
>
> Regards,
> Chanh
>
> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>
> There was a mention of using Zeppelin to share RDDs with many users. From
> the notes on Zeppelin it appears that this is sharing UI and I am not sure
> how easy it is going to be changing the result set with different users
> modifying say sql queries.
>
> There is also the idea of caching RDDs with something like Apache Ignite.
> Has anyone really tried this. Will that work with multiple applications?
>
> It looks feasible as RDDs are immutable and so are registered tempTables
> etc.
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>

Re: Sharing RDDS across applications and users

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks Chanh,

Can it share RDDs.

Personally I have not used either Alluxio or Ignite.


   1. Are there major differences between these two
   2. Have you tried Alluxio for sharing Spark RDDs and if so do you have
   any experience you can kindly share

Regards


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 October 2016 at 11:29, Chanh Le <gi...@gmail.com> wrote:

> Hi Mich,
> Alluxio is the good option to go.
>
> Regards,
> Chanh
>
> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>
> There was a mention of using Zeppelin to share RDDs with many users. From
> the notes on Zeppelin it appears that this is sharing UI and I am not sure
> how easy it is going to be changing the result set with different users
> modifying say sql queries.
>
> There is also the idea of caching RDDs with something like Apache Ignite.
> Has anyone really tried this. Will that work with multiple applications?
>
> It looks feasible as RDDs are immutable and so are registered tempTables
> etc.
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>

Re: Sharing RDDS across applications and users

Posted by Chanh Le <gi...@gmail.com>.
Hi Mich,
Alluxio is the good option to go. 

Regards,
Chanh

> On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> 
> There was a mention of using Zeppelin to share RDDs with many users. From the notes on Zeppelin it appears that this is sharing UI and I am not sure how easy it is going to be changing the result set with different users modifying say sql queries.
> 
> There is also the idea of caching RDDs with something like Apache Ignite. Has anyone really tried this. Will that work with multiple applications?
> 
> It looks feasible as RDDs are immutable and so are registered tempTables etc.
> 
> Thanks
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>