You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Charles Earl <ch...@me.com> on 2012/12/18 18:26:06 UTC

Limits of Free-form Query Imports

Hi,
Are there any best practices or caveats for including nested joins in free from query imports?
I have noted that in the documentation it says "Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results." I'm relatively new to the use of sqoop, have not encountered any problems, but I imagine that multiple mapper imports combine with complex joins might produce inconsistent results, as it seems that the parallelism depends upon range partitioning based on the splitting column. Or perhaps this is over thinking….

Charles 



Re: Limits of Free-form Query Imports

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Charles,
unfortunately non-trivial joins might lead to an unexpected results and issues. One caveat is that Sqoop will run your expensive query in parallel which might lead to undesirable performance hit on the database side. One way how to overcome this issue is to run your expensive non-trivial query prior Sqoop import and store it's output as an table, for example in MySQL you can do

CREATE TABLE sqoop_tmp_table AS SELECT ... JOIN ... JOIN ... JOIN ... JOIN ... JOIN ... (query that you've used originally)

Jarcec

On Tue, Dec 18, 2012 at 12:26:06PM -0500, Charles Earl wrote:
> Hi,
> Are there any best practices or caveats for including nested joins in free from query imports?
> I have noted that in the documentation it says "Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results." I'm relatively new to the use of sqoop, have not encountered any problems, but I imagine that multiple mapper imports combine with complex joins might produce inconsistent results, as it seems that the parallelism depends upon range partitioning based on the splitting column. Or perhaps this is over thinking….
> 
> Charles 
> 
>