You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Chanh Le <gi...@gmail.com> on 2016/07/08 03:19:13 UTC

Any ways to connect BI tool to Spark without Hive

Hi everyone,
Currently we use Zeppelin to analytics our data and because of using SQL it’s hard to distribute for users use. But users are using some kind of Oracle BI tools to analytic because it support some kinds of drag and drop and we can do some kind of permitted for each user.
Our architecture is Spark, Alluxio, Zeppelin. Because We want to share what we have done in Zeppelin to business users. 

Is there any way to do that?

Thanks.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Any ways to connect BI tool to Spark without Hive

Posted by Mich Talebzadeh <mi...@gmail.com>.

Tableau has its own Tableau server that stores reports prepared not data.
It does not cache data. What users do is to access Tableau server from
their Tableau client and use reports. You still need to get data out from
the persistent store. I have not heard of Tableau having its own storage
layer. As far as I know it still accesses the database. I don't know what
you mean by caching data. Some of these reports are quite heavy.

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 July 2016 at 05:21, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> Spark Thrift does not need Hive/hadoop. STS should be your first choice if
> you are planning to integrate BI tools with Spark. It works with Zeppelin
> as well. We do all our development using Zeppelin and STS.
>
> One thing to note: many BI tools like Qliksense, Tablaue (not sure of
> oracle Bi Tool) queires and the caches data on client side. This works
> really well in real life.
>
>
> On Fri, Jul 8, 2016 at 1:58 PM, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi Mich,
>> Thanks for replying. Currently we think we need to separate 2 groups of
>> user.
>> 1. Technical: Can write SQL
>> 2. Business: Can drag and drop fields or metrics and see the result.
>> Our stack using Zeppeline, Spark SQL to query data from Alluxio. Our data
>> current store in parquet files.* Zeppelin is using HiveContext but we
>> haven’t set up Hive and Hadoop yet*.
>>
>> I am little bit confuse in Spark Thift Server because Thift Server in
>> Spark can allow external tools connect but *is that require to set up
>> Hive and Hadoop*?
>>
>> Thanks and regards,
>> Chanh
>>
>>
>>
>> On Jul 8, 2016, at 10:49 AM, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>> hi,
>>
>> I have not used Alluxio but it is a distributed file system much like an
>> IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI
>> interface to your Spark which basically allows you graphs with Spark
>> queries.
>>
>> You mentioned Hive so I assume your persistent storage is Hive?
>>
>> Your business are using Oracle BI tool. It is like Tableau. I assume
>> Oracle BI tool accesses a database of some sort say Oracle DW using native
>> connectivity and it may also have ODBC and JDBC connections to Hive etc.
>>
>> The issue I see here is your GUI tool Zeppelin which does the same thing
>> as Oracle BI tool. Can you please clarify below:
>>
>>
>>    1. you use Hive as your database/persistent storage and use Alluxio
>>    on top of Hive?
>>    2. are users accessing Hive or a Data Warehouse like Oracle
>>    3. Oracle BI tools are pretty mature. Zeppelin is not in the same
>>    league so you have to decide which technology stack to follow
>>    4. Spark should work with Oracle BI tool as well (need to check this)
>>    as a fast query tool. In that case the users can use Oracle BI tool with
>>    Spark as well.
>>
>> It seems to me that the issue is that users don't want to move from
>> Oracle BI tool. We had the same issue with Tableau. So you really need to
>> make that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one
>> side.
>>
>> Zeppelin as I used it a while back may not do what Oracle BI tool does.
>> So the presentation layer has to be Oracle BI tool.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 8 July 2016 at 04:19, Chanh Le <gi...@gmail.com> wrote:
>>
>>> Hi everyone,
>>> Currently we use Zeppelin to analytics our data and because of using SQL
>>> it’s hard to distribute for users use. But users are using some kind of
>>> Oracle BI tools to analytic because it support some kinds of drag and drop
>>> and we can do some kind of permitted for each user.
>>> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share
>>> what we have done in Zeppelin to business users.
>>>
>>> Is there any way to do that?
>>>
>>> Thanks.
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Any ways to connect BI tool to Spark without Hive

Posted by ayan guha <gu...@gmail.com>.

Yes, absolutely. You need to "save" the table using saveAsTable function.
Underlying storage is HDFS or any other storage and you are basically using
spark's embedded hive services (when you do not have hadoop in the set up).

Think STS as a JDBC server in front of your datastore. STS runs as a spark
application so you can also monitor it using Spark master UI, assuming you
are using standalone cluster.

On Fri, Jul 8, 2016 at 2:34 PM, Chanh Le <gi...@gmail.com> wrote:

> Hi Ayan,
>
> Thanks for replying. It’s sound great. Let me check.
> One thing confuse is there any way to share things between too? I mean
> Zeppelin and Thift Server. For example: I update, insert data to a table on
> Zeppelin and external tool connect through STS can get it.
>
> Thanks & regards,
> Chanh
>
> On Jul 8, 2016, at 11:21 AM, ayan guha <gu...@gmail.com> wrote:
>
> Hi
>
> Spark Thrift does not need Hive/hadoop. STS should be your first choice if
> you are planning to integrate BI tools with Spark. It works with Zeppelin
> as well. We do all our development using Zeppelin and STS.
>
> One thing to note: many BI tools like Qliksense, Tablaue (not sure of
> oracle Bi Tool) queires and the caches data on client side. This works
> really well in real life.
>
>
> On Fri, Jul 8, 2016 at 1:58 PM, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi Mich,
>> Thanks for replying. Currently we think we need to separate 2 groups of
>> user.
>> 1. Technical: Can write SQL
>> 2. Business: Can drag and drop fields or metrics and see the result.
>> Our stack using Zeppeline, Spark SQL to query data from Alluxio. Our data
>> current store in parquet files.* Zeppelin is using HiveContext but we
>> haven’t set up Hive and Hadoop yet*.
>>
>> I am little bit confuse in Spark Thift Server because Thift Server in
>> Spark can allow external tools connect but *is that require to set up
>> Hive and Hadoop*?
>>
>> Thanks and regards,
>> Chanh
>>
>>
>>
>> On Jul 8, 2016, at 10:49 AM, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>> hi,
>>
>> I have not used Alluxio but it is a distributed file system much like an
>> IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI
>> interface to your Spark which basically allows you graphs with Spark
>> queries.
>>
>> You mentioned Hive so I assume your persistent storage is Hive?
>>
>> Your business are using Oracle BI tool. It is like Tableau. I assume
>> Oracle BI tool accesses a database of some sort say Oracle DW using native
>> connectivity and it may also have ODBC and JDBC connections to Hive etc.
>>
>> The issue I see here is your GUI tool Zeppelin which does the same thing
>> as Oracle BI tool. Can you please clarify below:
>>
>>
>>    1. you use Hive as your database/persistent storage and use Alluxio
>>    on top of Hive?
>>    2. are users accessing Hive or a Data Warehouse like Oracle
>>    3. Oracle BI tools are pretty mature. Zeppelin is not in the same
>>    league so you have to decide which technology stack to follow
>>    4. Spark should work with Oracle BI tool as well (need to check this)
>>    as a fast query tool. In that case the users can use Oracle BI tool with
>>    Spark as well.
>>
>> It seems to me that the issue is that users don't want to move from
>> Oracle BI tool. We had the same issue with Tableau. So you really need to
>> make that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one
>> side.
>>
>> Zeppelin as I used it a while back may not do what Oracle BI tool does.
>> So the presentation layer has to be Oracle BI tool.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 8 July 2016 at 04:19, Chanh Le <gi...@gmail.com> wrote:
>>
>>> Hi everyone,
>>> Currently we use Zeppelin to analytics our data and because of using SQL
>>> it’s hard to distribute for users use. But users are using some kind of
>>> Oracle BI tools to analytic because it support some kinds of drag and drop
>>> and we can do some kind of permitted for each user.
>>> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share
>>> what we have done in Zeppelin to business users.
>>>
>>> Is there any way to do that?
>>>
>>> Thanks.
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>


-- 
Best Regards,
Ayan Guha

Re: Any ways to connect BI tool to Spark without Hive

Posted by Chanh Le <gi...@gmail.com>.

Hi Ayan,

Thanks for replying. It’s sound great. Let me check. 
One thing confuse is there any way to share things between too? I mean Zeppelin and Thift Server. For example: I update, insert data to a table on Zeppelin and external tool connect through STS can get it.

Thanks & regards,
Chanh

> On Jul 8, 2016, at 11:21 AM, ayan guha <gu...@gmail.com> wrote:
> 
> Hi
> 
> Spark Thrift does not need Hive/hadoop. STS should be your first choice if you are planning to integrate BI tools with Spark. It works with Zeppelin as well. We do all our development using Zeppelin and STS. 
> 
> One thing to note: many BI tools like Qliksense, Tablaue (not sure of oracle Bi Tool) queires and the caches data on client side. This works really well in real life. 
> 
> 
> On Fri, Jul 8, 2016 at 1:58 PM, Chanh Le <giaosudau@gmail.com <ma...@gmail.com>> wrote:
> Hi Mich,
> Thanks for replying. Currently we think we need to separate 2 groups of user. 
> 1. Technical: Can write SQL 
> 2. Business: Can drag and drop fields or metrics and see the result.
> Our stack using Zeppeline, Spark SQL to query data from Alluxio. Our data current store in parquet files. Zeppelin is using HiveContext but we haven’t set up Hive and Hadoop yet. 
> 
> I am little bit confuse in Spark Thift Server because Thift Server in Spark can allow external tools connect but is that require to set up Hive and Hadoop?
> 
> Thanks and regards,
> Chanh
> 
> 
> 
>> On Jul 8, 2016, at 10:49 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>> 
>> hi,
>> 
>> I have not used Alluxio but it is a distributed file system much like an IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI interface to your Spark which basically allows you graphs with Spark queries.
>> 
>> You mentioned Hive so I assume your persistent storage is Hive?
>> 
>> Your business are using Oracle BI tool. It is like Tableau. I assume Oracle BI tool accesses a database of some sort say Oracle DW using native connectivity and it may also have ODBC and JDBC connections to Hive etc.
>> 
>> The issue I see here is your GUI tool Zeppelin which does the same thing as Oracle BI tool. Can you please clarify below:
>> 
>> you use Hive as your database/persistent storage and use Alluxio on top of Hive?
>> are users accessing Hive or a Data Warehouse like Oracle
>> Oracle BI tools are pretty mature. Zeppelin is not in the same league so you have to decide which technology stack to follow
>> Spark should work with Oracle BI tool as well (need to check this) as a fast query tool. In that case the users can use Oracle BI tool with Spark as well.
>> It seems to me that the issue is that users don't want to move from Oracle BI tool. We had the same issue with Tableau. So you really need to make that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one side.
>> 
>> Zeppelin as I used it a while back may not do what Oracle BI tool does. So the presentation layer has to be Oracle BI tool.
>> 
>> HTH
>> 
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>>  
>> 
>> On 8 July 2016 at 04:19, Chanh Le <giaosudau@gmail.com <ma...@gmail.com>> wrote:
>> Hi everyone,
>> Currently we use Zeppelin to analytics our data and because of using SQL it’s hard to distribute for users use. But users are using some kind of Oracle BI tools to analytic because it support some kinds of drag and drop and we can do some kind of permitted for each user.
>> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share what we have done in Zeppelin to business users.
>> 
>> Is there any way to do that?
>> 
>> Thanks.
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
>> 
>> 
> 
> 
> 
> 
> -- 
> Best Regards,
> Ayan Guha

Re: Any ways to connect BI tool to Spark without Hive

Posted by ayan guha <gu...@gmail.com>.

Hi

Spark Thrift does not need Hive/hadoop. STS should be your first choice if
you are planning to integrate BI tools with Spark. It works with Zeppelin
as well. We do all our development using Zeppelin and STS.

One thing to note: many BI tools like Qliksense, Tablaue (not sure of
oracle Bi Tool) queires and the caches data on client side. This works
really well in real life.


On Fri, Jul 8, 2016 at 1:58 PM, Chanh Le <gi...@gmail.com> wrote:

> Hi Mich,
> Thanks for replying. Currently we think we need to separate 2 groups of
> user.
> 1. Technical: Can write SQL
> 2. Business: Can drag and drop fields or metrics and see the result.
> Our stack using Zeppeline, Spark SQL to query data from Alluxio. Our data
> current store in parquet files.* Zeppelin is using HiveContext but we
> haven’t set up Hive and Hadoop yet*.
>
> I am little bit confuse in Spark Thift Server because Thift Server in
> Spark can allow external tools connect but *is that require to set up
> Hive and Hadoop*?
>
> Thanks and regards,
> Chanh
>
>
>
> On Jul 8, 2016, at 10:49 AM, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> hi,
>
> I have not used Alluxio but it is a distributed file system much like an
> IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI
> interface to your Spark which basically allows you graphs with Spark
> queries.
>
> You mentioned Hive so I assume your persistent storage is Hive?
>
> Your business are using Oracle BI tool. It is like Tableau. I assume
> Oracle BI tool accesses a database of some sort say Oracle DW using native
> connectivity and it may also have ODBC and JDBC connections to Hive etc.
>
> The issue I see here is your GUI tool Zeppelin which does the same thing
> as Oracle BI tool. Can you please clarify below:
>
>
>    1. you use Hive as your database/persistent storage and use Alluxio on
>    top of Hive?
>    2. are users accessing Hive or a Data Warehouse like Oracle
>    3. Oracle BI tools are pretty mature. Zeppelin is not in the same
>    league so you have to decide which technology stack to follow
>    4. Spark should work with Oracle BI tool as well (need to check this)
>    as a fast query tool. In that case the users can use Oracle BI tool with
>    Spark as well.
>
> It seems to me that the issue is that users don't want to move from Oracle
> BI tool. We had the same issue with Tableau. So you really need to make
> that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one side.
>
> Zeppelin as I used it a while back may not do what Oracle BI tool does. So
> the presentation layer has to be Oracle BI tool.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 8 July 2016 at 04:19, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi everyone,
>> Currently we use Zeppelin to analytics our data and because of using SQL
>> it’s hard to distribute for users use. But users are using some kind of
>> Oracle BI tools to analytic because it support some kinds of drag and drop
>> and we can do some kind of permitted for each user.
>> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share
>> what we have done in Zeppelin to business users.
>>
>> Is there any way to do that?
>>
>> Thanks.
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: Any ways to connect BI tool to Spark without Hive

Posted by Chanh Le <gi...@gmail.com>.

Hi Mich,

Actually technical users they can write some kind of complex machine learning things in the future too so that why zeppelin is promising.

> Those business users. Do they Oracle BI (OBI) to connect to DW like Oracle now?
Yes, they are. Our data is still storing in Oracle but It’s becoming bigger and bigger everyday and some queries can’t execute in Oracle then we’re moving to another storage and using Spark to query. 

> I have also Hive running on Spark engine that makes such a solution easier by allowing users to connect to Hive and execute their queries. You want to provide a fast retrieval system for your users. Your case is interesting as you have two parallel stack here.


So It means I still need to setup Hive and Hadoop? Because our resource is limited. We need to spend memory and cpu for Spark and Alluxio almost.

Thanks & regards,
Chanh




> On Jul 8, 2016, at 11:18 AM, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Interesting  Chanh
> 
> Those business users. Do they Oracle BI (OBI) to connect to DW like Oracle now?
> 
> Certainly power users can use Zeppelin to write code that will be executed through Spark but much doubt Zeppelin can do what OBI tool provides.
> 
> What you need is to investigate if OBI tool can connect to Spark Thrift Server to use Spark to access your parquet files. Your parquet files are already on HDFS (part of Hadoop).
> 
>  Hive has ODBC interfaces to Tableau and sure it can also work with OBI.
> 
> I have also Hive running on Spark engine that makes such a solution easier by allowing users to connect to Hive and execute their queries. You want to provide a fast retrieval system for your users. Your case is interesting as you have two parallel stack here.
> 
> HTH
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>  
> 
> On 8 July 2016 at 04:58, Chanh Le <giaosudau@gmail.com <ma...@gmail.com>> wrote:
> Hi Mich,
> Thanks for replying. Currently we think we need to separate 2 groups of user. 
> 1. Technical: Can write SQL 
> 2. Business: Can drag and drop fields or metrics and see the result.
> Our stack using Zeppeline, Spark SQL to query data from Alluxio. Our data current store in parquet files. Zeppelin is using HiveContext but we haven’t set up Hive and Hadoop yet. 
> 
> I am little bit confuse in Spark Thift Server because Thift Server in Spark can allow external tools connect but is that require to set up Hive and Hadoop?
> 
> Thanks and regards,
> Chanh
> 
> 
> 
>> On Jul 8, 2016, at 10:49 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>> 
>> hi,
>> 
>> I have not used Alluxio but it is a distributed file system much like an IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI interface to your Spark which basically allows you graphs with Spark queries.
>> 
>> You mentioned Hive so I assume your persistent storage is Hive?
>> 
>> Your business are using Oracle BI tool. It is like Tableau. I assume Oracle BI tool accesses a database of some sort say Oracle DW using native connectivity and it may also have ODBC and JDBC connections to Hive etc.
>> 
>> The issue I see here is your GUI tool Zeppelin which does the same thing as Oracle BI tool. Can you please clarify below:
>> 
>> you use Hive as your database/persistent storage and use Alluxio on top of Hive?
>> are users accessing Hive or a Data Warehouse like Oracle
>> Oracle BI tools are pretty mature. Zeppelin is not in the same league so you have to decide which technology stack to follow
>> Spark should work with Oracle BI tool as well (need to check this) as a fast query tool. In that case the users can use Oracle BI tool with Spark as well.
>> It seems to me that the issue is that users don't want to move from Oracle BI tool. We had the same issue with Tableau. So you really need to make that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one side.
>> 
>> Zeppelin as I used it a while back may not do what Oracle BI tool does. So the presentation layer has to be Oracle BI tool.
>> 
>> HTH
>> 
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>>  
>> 
>> On 8 July 2016 at 04:19, Chanh Le <giaosudau@gmail.com <ma...@gmail.com>> wrote:
>> Hi everyone,
>> Currently we use Zeppelin to analytics our data and because of using SQL it’s hard to distribute for users use. But users are using some kind of Oracle BI tools to analytic because it support some kinds of drag and drop and we can do some kind of permitted for each user.
>> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share what we have done in Zeppelin to business users.
>> 
>> Is there any way to do that?
>> 
>> Thanks.
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
>> 
>> 
> 
>

Re: Any ways to connect BI tool to Spark without Hive

Posted by Mich Talebzadeh <mi...@gmail.com>.

Interesting  Chanh

Those business users. Do they Oracle BI (OBI) to connect to DW like Oracle
now?

Certainly power users can use Zeppelin to write code that will be executed
through Spark but much doubt Zeppelin can do what OBI tool provides.

What you need is to investigate if OBI tool can connect to Spark Thrift
Server to use Spark to access your parquet files. Your parquet files are
already on HDFS (part of Hadoop).

 Hive has ODBC interfaces to Tableau and sure it can also work with OBI.

I have also Hive running on Spark engine that makes such a solution easier
by allowing users to connect to Hive and execute their queries. You want to
provide a fast retrieval system for your users. Your case is interesting as
you have two parallel stack here.

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 July 2016 at 04:58, Chanh Le <gi...@gmail.com> wrote:

> Hi Mich,
> Thanks for replying. Currently we think we need to separate 2 groups of
> user.
> 1. Technical: Can write SQL
> 2. Business: Can drag and drop fields or metrics and see the result.
> Our stack using Zeppeline, Spark SQL to query data from Alluxio. Our data
> current store in parquet files.* Zeppelin is using HiveContext but we
> haven’t set up Hive and Hadoop yet*.
>
> I am little bit confuse in Spark Thift Server because Thift Server in
> Spark can allow external tools connect but *is that require to set up
> Hive and Hadoop*?
>
> Thanks and regards,
> Chanh
>
>
>
> On Jul 8, 2016, at 10:49 AM, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> hi,
>
> I have not used Alluxio but it is a distributed file system much like an
> IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI
> interface to your Spark which basically allows you graphs with Spark
> queries.
>
> You mentioned Hive so I assume your persistent storage is Hive?
>
> Your business are using Oracle BI tool. It is like Tableau. I assume
> Oracle BI tool accesses a database of some sort say Oracle DW using native
> connectivity and it may also have ODBC and JDBC connections to Hive etc.
>
> The issue I see here is your GUI tool Zeppelin which does the same thing
> as Oracle BI tool. Can you please clarify below:
>
>
>    1. you use Hive as your database/persistent storage and use Alluxio on
>    top of Hive?
>    2. are users accessing Hive or a Data Warehouse like Oracle
>    3. Oracle BI tools are pretty mature. Zeppelin is not in the same
>    league so you have to decide which technology stack to follow
>    4. Spark should work with Oracle BI tool as well (need to check this)
>    as a fast query tool. In that case the users can use Oracle BI tool with
>    Spark as well.
>
> It seems to me that the issue is that users don't want to move from Oracle
> BI tool. We had the same issue with Tableau. So you really need to make
> that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one side.
>
> Zeppelin as I used it a while back may not do what Oracle BI tool does. So
> the presentation layer has to be Oracle BI tool.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 8 July 2016 at 04:19, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi everyone,
>> Currently we use Zeppelin to analytics our data and because of using SQL
>> it’s hard to distribute for users use. But users are using some kind of
>> Oracle BI tools to analytic because it support some kinds of drag and drop
>> and we can do some kind of permitted for each user.
>> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share
>> what we have done in Zeppelin to business users.
>>
>> Is there any way to do that?
>>
>> Thanks.
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>

Re: Any ways to connect BI tool to Spark without Hive

Posted by Chanh Le <gi...@gmail.com>.

Hi Mich,
Thanks for replying. Currently we think we need to separate 2 groups of user. 
1. Technical: Can write SQL 
2. Business: Can drag and drop fields or metrics and see the result.
Our stack using Zeppeline, Spark SQL to query data from Alluxio. Our data current store in parquet files. Zeppelin is using HiveContext but we haven’t set up Hive and Hadoop yet. 

I am little bit confuse in Spark Thift Server because Thift Server in Spark can allow external tools connect but is that require to set up Hive and Hadoop?

Thanks and regards,
Chanh



> On Jul 8, 2016, at 10:49 AM, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> hi,
> 
> I have not used Alluxio but it is a distributed file system much like an IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI interface to your Spark which basically allows you graphs with Spark queries.
> 
> You mentioned Hive so I assume your persistent storage is Hive?
> 
> Your business are using Oracle BI tool. It is like Tableau. I assume Oracle BI tool accesses a database of some sort say Oracle DW using native connectivity and it may also have ODBC and JDBC connections to Hive etc.
> 
> The issue I see here is your GUI tool Zeppelin which does the same thing as Oracle BI tool. Can you please clarify below:
> 
> you use Hive as your database/persistent storage and use Alluxio on top of Hive?
> are users accessing Hive or a Data Warehouse like Oracle
> Oracle BI tools are pretty mature. Zeppelin is not in the same league so you have to decide which technology stack to follow
> Spark should work with Oracle BI tool as well (need to check this) as a fast query tool. In that case the users can use Oracle BI tool with Spark as well.
> It seems to me that the issue is that users don't want to move from Oracle BI tool. We had the same issue with Tableau. So you really need to make that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one side.
> 
> Zeppelin as I used it a while back may not do what Oracle BI tool does. So the presentation layer has to be Oracle BI tool.
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>  
> 
> On 8 July 2016 at 04:19, Chanh Le <giaosudau@gmail.com <ma...@gmail.com>> wrote:
> Hi everyone,
> Currently we use Zeppelin to analytics our data and because of using SQL it’s hard to distribute for users use. But users are using some kind of Oracle BI tools to analytic because it support some kinds of drag and drop and we can do some kind of permitted for each user.
> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share what we have done in Zeppelin to business users.
> 
> Is there any way to do that?
> 
> Thanks.
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
>

Re: Any ways to connect BI tool to Spark without Hive

Posted by Mich Talebzadeh <mi...@gmail.com>.

hi,

I have not used Alluxio but it is a distributed file system much like an
IMDB say Oracle TimesTen. Spark is your query tool and Zeppelin is the GUI
interface to your Spark which basically allows you graphs with Spark
queries.

You mentioned Hive so I assume your persistent storage is Hive?

Your business are using Oracle BI tool. It is like Tableau. I assume Oracle
BI tool accesses a database of some sort say Oracle DW using native
connectivity and it may also have ODBC and JDBC connections to Hive etc.

The issue I see here is your GUI tool Zeppelin which does the same thing as
Oracle BI tool. Can you please clarify below:

   1. you use Hive as your database/persistent storage and use Alluxio on
   top of Hive?
   2. are users accessing Hive or a Data Warehouse like Oracle
   3. Oracle BI tools are pretty mature. Zeppelin is not in the same league
   so you have to decide which technology stack to follow
   4. Spark should work with Oracle BI tool as well (need to check this) as
   a fast query tool. In that case the users can use Oracle BI tool with Spark
   as well.

It seems to me that the issue is that users don't want to move from Oracle
BI tool. We had the same issue with Tableau. So you really need to make
that Oracle BI tool use Spark and Alluxio and leave Zeppelin at one side.

Zeppelin as I used it a while back may not do what Oracle BI tool does. So
the presentation layer has to be Oracle BI tool.

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 8 July 2016 at 04:19, Chanh Le <gi...@gmail.com> wrote:

> Hi everyone,
> Currently we use Zeppelin to analytics our data and because of using SQL
> it’s hard to distribute for users use. But users are using some kind of
> Oracle BI tools to analytic because it support some kinds of drag and drop
> and we can do some kind of permitted for each user.
> Our architecture is Spark, Alluxio, Zeppelin. Because We want to share
> what we have done in Zeppelin to business users.
>
> Is there any way to do that?
>
> Thanks.
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>