You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ka...@barclays.com on 2012/05/10 09:47:03 UTC

SQL analysis

We are looking at doing some initial analysis on SQL text info within the query runs to come up with some kind of path output to depict how various tables are linked to each other. For example. A 'view' might be a join from two table's top of the hierarchy and in turn might be creating some new tables, etc.

Any suggestions, thoughts, etc. on how can we approach this within the Hadoop space.



Regards,
Karan





This e-mail and any attachments are confidential and intended
solely for the addressee and may also be privileged or exempt from
disclosure under applicable law. If you are not the addressee, or
have received this e-mail in error, please notify the sender
immediately, delete it from your system and do not copy, disclose
or otherwise act upon any part of this e-mail or its attachments.

Internet communications are not guaranteed to be secure or
virus-free.
The Barclays Group does not accept responsibility for any loss
arising from unauthorised access to, or interference with, any
Internet communications by any third party, or from the
transmission of any viruses. Replies to this e-mail may be
monitored by the Barclays Group for operational or business
reasons.

Any opinion or other information in this e-mail or its attachments
that does not relate to the business of the Barclays Group is
personal to the sender and is not given or endorsed by the Barclays
Group.

Barclays Bank PLC. Registered in England and Wales (registered no.
1026167).
Registered Office: 1 Churchill Place, London, E14 5HP, United
Kingdom.

Barclays Bank PLC is authorised and regulated by the Financial
Services Authority.

Re: SQL analysis

Posted by Shi Yu <sh...@uchicago.edu>.
If the analysis you mention is to create "view" of multiple tables. Once 
your data is sorted by the keys in HDFS. You could try Map Side join or 
Reducer Side join in Hadoop to generate the "view" of your data (same 
keys of multiple data sets are combined). There are many code samples 
web, play it around might help.

If you want further analysis like Business Intelligence, then you need 
to train various models.



On 5/10/2012 8:30 AM, karanveer.singh@barclays.com wrote:
> I am more worried about the analysis assuming this data is in HDFS.
>
>
> -----Original Message-----
> From: Shi Yu [mailto:shiyu@uchicago.edu]
> Sent: 10 May 2012 18:58
> To: common-user@hadoop.apache.org
> Subject: RE: SQL analysis
>
> Flume might be suitable for your case.
>
> https://cwiki.apache.org/FLUME/
>
> Shi
> This e-mail and any attachments are confidential and intended
> solely for the addressee and may also be privileged or exempt from
> disclosure under applicable law. If you are not the addressee, or
> have received this e-mail in error, please notify the sender
> immediately, delete it from your system and do not copy, disclose
> or otherwise act upon any part of this e-mail or its attachments.
>
> Internet communications are not guaranteed to be secure or
> virus-free.
> The Barclays Group does not accept responsibility for any loss
> arising from unauthorised access to, or interference with, any
> Internet communications by any third party, or from the
> transmission of any viruses. Replies to this e-mail may be
> monitored by the Barclays Group for operational or business
> reasons.
>
> Any opinion or other information in this e-mail or its attachments
> that does not relate to the business of the Barclays Group is
> personal to the sender and is not given or endorsed by the Barclays
> Group.
>
> Barclays Bank PLC. Registered in England and Wales (registered no.
> 1026167).
> Registered Office: 1 Churchill Place, London, E14 5HP, United
> Kingdom.
>
> Barclays Bank PLC is authorised and regulated by the Financial
> Services Authority.
>


RE: SQL analysis

Posted by ka...@barclays.com.
I am more worried about the analysis assuming this data is in HDFS.


-----Original Message-----
From: Shi Yu [mailto:shiyu@uchicago.edu] 
Sent: 10 May 2012 18:58
To: common-user@hadoop.apache.org
Subject: RE: SQL analysis 

Flume might be suitable for your case.

https://cwiki.apache.org/FLUME/

Shi 
This e-mail and any attachments are confidential and intended
solely for the addressee and may also be privileged or exempt from
disclosure under applicable law. If you are not the addressee, or
have received this e-mail in error, please notify the sender
immediately, delete it from your system and do not copy, disclose
or otherwise act upon any part of this e-mail or its attachments.

Internet communications are not guaranteed to be secure or
virus-free.
The Barclays Group does not accept responsibility for any loss
arising from unauthorised access to, or interference with, any
Internet communications by any third party, or from the
transmission of any viruses. Replies to this e-mail may be
monitored by the Barclays Group for operational or business
reasons.

Any opinion or other information in this e-mail or its attachments
that does not relate to the business of the Barclays Group is
personal to the sender and is not given or endorsed by the Barclays
Group.

Barclays Bank PLC. Registered in England and Wales (registered no.
1026167).
Registered Office: 1 Churchill Place, London, E14 5HP, United
Kingdom.

Barclays Bank PLC is authorised and regulated by the Financial
Services Authority.


RE: SQL analysis

Posted by Shi Yu <sh...@uchicago.edu>.
Flume might be suitable for your case.

https://cwiki.apache.org/FLUME/

Shi 

RE: SQL analysis

Posted by ka...@barclays.com.
Our focus as of now is on batch queries. And keen to explore on the approach to get to the path analysis. 

Regards,
Karanveer

-----Original Message-----
From: Shi Yu [mailto:shiyu@uchicago.edu] 
Sent: 10 May 2012 17:02
To: common-user@hadoop.apache.org
Subject: Re: SQL analysis 

It depends on your use case, for example, query only or you have 
requirement of real time insert and update.  The solutions can 
be different. 

You might need consider HBase, Cassandra or tools like Flume. 
This e-mail and any attachments are confidential and intended
solely for the addressee and may also be privileged or exempt from
disclosure under applicable law. If you are not the addressee, or
have received this e-mail in error, please notify the sender
immediately, delete it from your system and do not copy, disclose
or otherwise act upon any part of this e-mail or its attachments.

Internet communications are not guaranteed to be secure or
virus-free.
The Barclays Group does not accept responsibility for any loss
arising from unauthorised access to, or interference with, any
Internet communications by any third party, or from the
transmission of any viruses. Replies to this e-mail may be
monitored by the Barclays Group for operational or business
reasons.

Any opinion or other information in this e-mail or its attachments
that does not relate to the business of the Barclays Group is
personal to the sender and is not given or endorsed by the Barclays
Group.

Barclays Bank PLC. Registered in England and Wales (registered no.
1026167).
Registered Office: 1 Churchill Place, London, E14 5HP, United
Kingdom.

Barclays Bank PLC is authorised and regulated by the Financial
Services Authority.


Re: SQL analysis

Posted by Shi Yu <sh...@uchicago.edu>.
It depends on your use case, for example, query only or you have 
requirement of real time insert and update.  The solutions can 
be different. 

You might need consider HBase, Cassandra or tools like Flume.