You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hendy Irawan (JIRA)" <ji...@apache.org> on 2014/11/10 15:49:34 UTC

[jira] [Created] (SPARK-4317) Error querying Avro files imported by Sqoop: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes

Hendy Irawan created SPARK-4317:
-----------------------------------

             Summary: Error querying Avro files imported by Sqoop: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes
                 Key: SPARK-4317
                 URL: https://issues.apache.org/jira/browse/SPARK-4317
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.1.0
         Environment: Spark 1.1.0, Sqoop 1.4.5, PostgreSQL 9.3
            Reporter: Hendy Irawan


After importing table from PostgreSQL 9.3 to Avro file using Sqoop 1.4.5, Spark SQL 1.1.0 is unable to process it:

(note that Hive 0.13 can process the Avro file just fine)

{code}
spark-sql> select city from place;
14/11/10 10:15:08 INFO ParseDriver: Parsing command: select city from place
14/11/10 10:15:08 INFO ParseDriver: Parse Completed
14/11/10 10:15:08 INFO HiveMetaStore: 0: get_table : db=default tbl=place
14/11/10 10:15:08 INFO audit: ugi=ceefour       ip=unknown-ip-addr      cmd=get_table : db=default tbl=place
14/11/10 10:15:08 ERROR SparkSQLDriver: Failed in [select city from place]
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'city, tree:
Project ['city]
 LowerCaseSchema 
  MetastoreRelation default, place, None

        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
        at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
        at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
        at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
        at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:397)
        at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:397)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan$lzycompute(HiveContext.scala:358)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan(HiveContext.scala:357)
        at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
        at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
        at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
        at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'city, tree:
Project ['city]
 LowerCaseSchema 
  MetastoreRelation default, place, None

        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
        at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
        at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
        at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
        at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:397)
        at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:397)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan$lzycompute(HiveContext.scala:358)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan(HiveContext.scala:357)
        at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
        at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
        at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
        at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

14/11/10 10:15:08 ERROR CliDriver: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'city, tree:
Project ['city]
 LowerCaseSchema 
  MetastoreRelation default, place, None

        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
        at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
        at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
        at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
        at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:397)
        at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:397)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan$lzycompute(HiveContext.scala:358)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan(HiveContext.scala:357)
        at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
        at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
        at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
        at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

However a simple `COUNT(*)` works fine:

{code}
> SELECT COUNT(*) FROM place;
14/11/10 10:17:44 INFO ParseDriver: Parsing command: SELECT COUNT(*) FROM place
14/11/10 10:17:44 INFO ParseDriver: Parse Completed
14/11/10 10:17:44 INFO HiveMetaStore: 0: get_table : db=default tbl=place
14/11/10 10:17:44 INFO audit: ugi=ceefour       ip=unknown-ip-addr      cmd=get_table : db=default tbl=place
14/11/10 10:17:44 INFO MemoryStore: ensureFreeSpace(450038) called with curMem=1354834, maxMem=278019440
14/11/10 10:17:44 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 439.5 KB, free 263.4 MB)
14/11/10 10:17:44 INFO AvroSerDe: Configuration null, not inserting schema
14/11/10 10:17:44 INFO SparkContext: Starting job: collect at HiveContext.scala:415
14/11/10 10:17:44 INFO FileInputFormat: Total input paths to process : 1
14/11/10 10:17:44 INFO DAGScheduler: Registering RDD 37 (mapPartitions at Exchange.scala:86)
14/11/10 10:17:44 INFO DAGScheduler: Got job 3 (collect at HiveContext.scala:415) with 1 output partitions (allowLocal=false)
14/11/10 10:17:44 INFO DAGScheduler: Final stage: Stage 6(collect at HiveContext.scala:415)
14/11/10 10:17:44 INFO DAGScheduler: Parents of final stage: List(Stage 7)
14/11/10 10:17:44 INFO DAGScheduler: Missing parents: List(Stage 7)
14/11/10 10:17:44 INFO DAGScheduler: Submitting Stage 7 (MapPartitionsRDD[37] at mapPartitions at Exchange.scala:86), which has no missing parents
14/11/10 10:17:44 INFO MemoryStore: ensureFreeSpace(10880) called with curMem=1804872, maxMem=278019440
14/11/10 10:17:44 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 10.6 KB, free 263.4 MB)
14/11/10 10:17:44 INFO DAGScheduler: Submitting 2 missing tasks from Stage 7 (MapPartitionsRDD[37] at mapPartitions at Exchange.scala:86)
14/11/10 10:17:44 INFO TaskSchedulerImpl: Adding task set 7.0 with 2 tasks
14/11/10 10:17:44 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 9, localhost, PROCESS_LOCAL, 1220 bytes)
14/11/10 10:17:44 INFO TaskSetManager: Starting task 1.0 in stage 7.0 (TID 10, localhost, PROCESS_LOCAL, 1220 bytes)
14/11/10 10:17:44 INFO Executor: Running task 0.0 in stage 7.0 (TID 9)
14/11/10 10:17:44 INFO Executor: Running task 1.0 in stage 7.0 (TID 10)
14/11/10 10:17:44 INFO HadoopRDD: Input split: file:/media/ceefour/passport/databank/culinary/hdfs/place/part-m-00000.avro:0+381526
14/11/10 10:17:44 INFO HadoopRDD: Input split: file:/media/ceefour/passport/databank/culinary/hdfs/place/part-m-00000.avro:381526+381527
14/11/10 10:17:44 INFO AvroGenericRecordReader: Found the avro schema in the job: {"type":"record","name":"QueryResult","doc":"Sqoop import of QueryResult","fields":[{"name":"id","type":["string","null"],"columnName":"id","sqlType":"12"},{"name":"city","type":["string","null"],"columnName":"city","sqlType":"12"},{"name":"description","type":["string","null"],"columnName":"description","sqlType":"12"},{"name":"lat","type":["double","null"],"columnName":"lat","sqlType":"8"},{"name":"lng","type":["double","null"],"columnName":"lng","sqlType":"8"},{"name":"mapimagefile","type":["string","null"],"columnName":"mapimagefile","sqlType":"12"},{"name":"menu","type":["string","null"],"columnName":"menu","sqlType":"12"},{"name":"menuphotofile","type":["string","null"],"columnName":"menuphotofile","sqlType":"12"},{"name":"name","type":["string","null"],"columnName":"name","sqlType":"12"},{"name":"openinghours","type":["string","null"],"columnName":"openinghours","sqlType":"12"},{"name":"phonenumber","type":["string","null"],"columnName":"phonenumber","sqlType":"12"},{"name":"photofile","type":["string","null"],"columnName":"photofile","sqlType":"12"},{"name":"pricerange","type":["string","null"],"columnName":"pricerange","sqlType":"12"},{"name":"sourceuri","type":["string","null"],"columnName":"sourceuri","sqlType":"12"},{"name":"street","type":["string","null"],"columnName":"street","sqlType":"12"},{"name":"foursquareid","type":["string","null"],"columnName":"foursquareid","sqlType":"12"}],"tableName":"QueryResult"}
14/11/10 10:17:44 INFO AvroGenericRecordReader: Found the avro schema in the job: {"type":"record","name":"QueryResult","doc":"Sqoop import of QueryResult","fields":[{"name":"id","type":["string","null"],"columnName":"id","sqlType":"12"},{"name":"city","type":["string","null"],"columnName":"city","sqlType":"12"},{"name":"description","type":["string","null"],"columnName":"description","sqlType":"12"},{"name":"lat","type":["double","null"],"columnName":"lat","sqlType":"8"},{"name":"lng","type":["double","null"],"columnName":"lng","sqlType":"8"},{"name":"mapimagefile","type":["string","null"],"columnName":"mapimagefile","sqlType":"12"},{"name":"menu","type":["string","null"],"columnName":"menu","sqlType":"12"},{"name":"menuphotofile","type":["string","null"],"columnName":"menuphotofile","sqlType":"12"},{"name":"name","type":["string","null"],"columnName":"name","sqlType":"12"},{"name":"openinghours","type":["string","null"],"columnName":"openinghours","sqlType":"12"},{"name":"phonenumber","type":["string","null"],"columnName":"phonenumber","sqlType":"12"},{"name":"photofile","type":["string","null"],"columnName":"photofile","sqlType":"12"},{"name":"pricerange","type":["string","null"],"columnName":"pricerange","sqlType":"12"},{"name":"sourceuri","type":["string","null"],"columnName":"sourceuri","sqlType":"12"},{"name":"street","type":["string","null"],"columnName":"street","sqlType":"12"},{"name":"foursquareid","type":["string","null"],"columnName":"foursquareid","sqlType":"12"}],"tableName":"QueryResult"}
14/11/10 10:17:44 INFO Executor: Finished task 0.0 in stage 7.0 (TID 9). 1865 bytes result sent to driver
14/11/10 10:17:44 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 9) in 45 ms on localhost (1/2)
14/11/10 10:17:44 INFO Executor: Finished task 1.0 in stage 7.0 (TID 10). 1865 bytes result sent to driver
14/11/10 10:17:44 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID 10) in 53 ms on localhost (2/2)
14/11/10 10:17:44 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 
14/11/10 10:17:44 INFO DAGScheduler: Stage 7 (mapPartitions at Exchange.scala:86) finished in 0.054 s
14/11/10 10:17:44 INFO DAGScheduler: looking for newly runnable stages
14/11/10 10:17:44 INFO DAGScheduler: running: Set()
14/11/10 10:17:44 INFO DAGScheduler: waiting: Set(Stage 6)
14/11/10 10:17:44 INFO DAGScheduler: failed: Set()
14/11/10 10:17:44 INFO StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@40069cb0
14/11/10 10:17:44 INFO StatsReportListener: task runtime:(count: 2, mean: 49.000000, stdev: 4.000000, max: 53.000000, min: 45.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     45.0 ms 45.0 ms 45.0 ms 45.0 ms 53.0 ms 53.0 ms 53.0 ms 53.0 ms 53.0 ms
14/11/10 10:17:44 INFO DAGScheduler: Missing parents for Stage 6: List()
14/11/10 10:17:44 INFO StatsReportListener: shuffle bytes written:(count: 2, mean: 50.000000, stdev: 0.000000, max: 50.000000, min: 50.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B
14/11/10 10:17:44 INFO DAGScheduler: Submitting Stage 6 (MappedRDD[41] at map at HiveContext.scala:360), which is now runnable
14/11/10 10:17:44 INFO StatsReportListener: task result size:(count: 2, mean: 1865.000000, stdev: 0.000000, max: 1865.000000, min: 1865.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     1865.0 B        1865.0 B        1865.0 B        1865.0 B        1865.0 B        1865.0 B        1865.0 B        1865.0 B      1865.0 B
14/11/10 10:17:44 INFO StatsReportListener: executor (non-fetch) time pct: (count: 2, mean: 94.779874, stdev: 1.446541, max: 96.226415, min: 93.333333)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     93 %    93 %    93 %    93 %    96 %    96 %    96 %    96 %    96 %
14/11/10 10:17:44 INFO StatsReportListener: other time pct: (count: 2, mean: 5.220126, stdev: 1.446541, max: 6.666667, min: 3.773585)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:      4 %     4 %     4 %     4 %     7 %     7 %     7 %     7 %     7 %
14/11/10 10:17:44 INFO MemoryStore: ensureFreeSpace(9616) called with curMem=1815752, maxMem=278019440
14/11/10 10:17:44 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 9.4 KB, free 263.4 MB)
14/11/10 10:17:44 INFO DAGScheduler: Submitting 1 missing tasks from Stage 6 (MappedRDD[41] at map at HiveContext.scala:360)
14/11/10 10:17:44 INFO TaskSchedulerImpl: Adding task set 6.0 with 1 tasks
14/11/10 10:17:44 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 11, localhost, PROCESS_LOCAL, 948 bytes)
14/11/10 10:17:44 INFO Executor: Running task 0.0 in stage 6.0 (TID 11)
14/11/10 10:17:44 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/11/10 10:17:44 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
14/11/10 10:17:44 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 1 ms
14/11/10 10:17:44 INFO Executor: Finished task 0.0 in stage 6.0 (TID 11). 1076 bytes result sent to driver
14/11/10 10:17:44 INFO DAGScheduler: Stage 6 (collect at HiveContext.scala:415) finished in 0.008 s
14/11/10 10:17:44 INFO StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@209037df
14/11/10 10:17:44 INFO SparkContext: Job finished: collect at HiveContext.scala:415, took 0.113842844 s
6771
Time taken: 0.146 seconds
14/11/10 10:17:44 INFO StatsReportListener: task runtime:(count: 1, mean: 8.000000, stdev: 0.000000, max: 8.000000, min: 8.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     8.0 ms  8.0 ms  8.0 ms  8.0 ms  8.0 ms  8.0 ms  8.0 ms  8.0 ms  8.0 ms
14/11/10 10:17:44 INFO StatsReportListener: fetch wait time:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms
14/11/10 10:17:44 INFO StatsReportListener: remote bytes read:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B
14/11/10 10:17:44 INFO StatsReportListener: task result size:(count: 1, mean: 1076.000000, stdev: 0.000000, max: 1076.000000, min: 1076.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     1076.0 B        1076.0 B        1076.0 B        1076.0 B        1076.0 B        1076.0 B        1076.0 B        1076.0 B      1076.0 B
14/11/10 10:17:44 INFO StatsReportListener: executor (non-fetch) time pct: (count: 1, mean: 75.000000, stdev: 0.000000, max: 75.000000, min: 75.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:     75 %    75 %    75 %    75 %    75 %    75 %    75 %    75 %    75 %
14/11/10 10:17:44 INFO StatsReportListener: fetch wait time pct: (count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
14/11/10 10:17:44 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:44 INFO StatsReportListener:      0 %     0 %     0 %     0 %     0 %     0 %     0 %     0 %     0 %
14/11/10 10:17:45 INFO StatsReportListener: other time pct: (count: 1, mean: 25.000000, stdev: 0.000000, max: 25.000000, min: 25.000000)
14/11/10 10:17:45 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%     90%     95%     100%
14/11/10 10:17:45 INFO StatsReportListener:     25 %    25 %    25 %    25 %    25 %    25 %    25 %    25 %    25 %
14/11/10 10:17:45 INFO CliDriver: Time taken: 0.146 seconds
spark-sql> 14/11/10 10:17:44 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 11) in 8 ms on localhost (1/1)
14/11/10 10:17:45 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
{code}

It's probably because Sqoop creates nested schema ?

{code}
{
  "type" : "record",
  "name" : "QueryResult",
  "doc" : "Sqoop import of QueryResult",
  "fields" : [ {
    "name" : "id",
    "type" : [ "string", "null" ],
    "columnName" : "id",
    "sqlType" : "12"
  }, {
    "name" : "city",
    "type" : [ "string", "null" ],
    "columnName" : "city",
    "sqlType" : "12"
  }, {
    "name" : "description",
    "type" : [ "string", "null" ],
    "columnName" : "description",
    "sqlType" : "12"
  }, {
    "name" : "lat",
    "type" : [ "double", "null" ],
    "columnName" : "lat",
    "sqlType" : "8"
  }, {
    "name" : "lng",
    "type" : [ "double", "null" ],
    "columnName" : "lng",
    "sqlType" : "8"
  }, {
    "name" : "mapimagefile",
    "type" : [ "string", "null" ],
    "columnName" : "mapimagefile",
    "sqlType" : "12"
  }, {
    "name" : "menu",
    "type" : [ "string", "null" ],
    "columnName" : "menu",
    "sqlType" : "12"
  }, {
    "name" : "menuphotofile",
    "type" : [ "string", "null" ],
    "columnName" : "menuphotofile",
    "sqlType" : "12"
  }, {
    "name" : "name",
    "type" : [ "string", "null" ],
    "columnName" : "name",
    "sqlType" : "12"
  }, {
    "name" : "openinghours",
    "type" : [ "string", "null" ],
    "columnName" : "openinghours",
    "sqlType" : "12"
  }, {
    "name" : "phonenumber",
    "type" : [ "string", "null" ],
    "columnName" : "phonenumber",
    "sqlType" : "12"
  }, {
    "name" : "photofile",
    "type" : [ "string", "null" ],
    "columnName" : "photofile",
    "sqlType" : "12"
  }, {
    "name" : "pricerange",
    "type" : [ "string", "null" ],
    "columnName" : "pricerange",
    "sqlType" : "12"
  }, {
    "name" : "sourceuri",
    "type" : [ "string", "null" ],
    "columnName" : "sourceuri",
    "sqlType" : "12"
  }, {
    "name" : "street",
    "type" : [ "string", "null" ],
    "columnName" : "street",
    "sqlType" : "12"
  }, {
    "name" : "foursquareid",
    "type" : [ "string", "null" ],
    "columnName" : "foursquareid",
    "sqlType" : "12"
  } ],
  "tableName" : "QueryResult"
}
{code}

Sample record:

{code}
{"id":{"string":"17d8e71b-5c7f-4c23-8bbe-5af93a0a1847"},"city":null,"description":null,"lat":{"double":-7.00417828399503},"lng":{"double":107.63597989152},"mapimagefile":null,"menu":null,"menuphotofile":null,"name":{"string":"lontong sayur siliwangi"},"openinghours":null,"phonenumber":null,"photofile":null,"pricerange":null,"sourceuri":{"string":"https://id.foursquare.com/v/4f8b6772e4b00597a01917e8"},"street":null,"foursquareid":{"string":"4f8b6772e4b00597a01917e8"}}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org