You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Lee Ho Yeung <jo...@gmail.com> on 2016/06/15 01:19:38 UTC

can not show all data for this table

after tried following commands, can not show data

https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing

/home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
com.databricks:spark-csv_2.11:1.4.0

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").load("/home/martin/result002.csv")
df.printSchema()
df.registerTempTable("sales")
val aggDF = sqlContext.sql("select * from sales where a0 like \"%deep=3%\"")
df.collect.foreach(println)
aggDF.collect.foreach(println)



val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").load("/home/martin/result002.csv")
df.printSchema()
df.registerTempTable("sales")
sqlContext.sql("select * from sales").take(30).foreach(println)

Re: can not show all data for this table

Posted by Mich Talebzadeh <mi...@gmail.com>.
at last some progress :)

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 June 2016 at 10:52, Lee Ho Yeung <jo...@gmail.com> wrote:

> Hi Mich,
>
> i find my problem cause now, i missed setting delimiter which is tab,
>
> but it got error,
>
> and i notice that only libre office and open and read well, even if Excel
> in window, it still can not separate in well format
>
> scala> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").option("delimiter",
> "").load("/home/martin/result002.csv")
> java.lang.StringIndexOutOfBoundsException: String index out of range: 0
>
>
> On Wed, Jun 15, 2016 at 12:14 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> there may be an issue with data in your csv file. like blank header line
>> etc.
>>
>> sounds like you have an issue there. I normally get rid of blank lines
>> before putting csv file in hdfs.
>>
>> can you actually select from that temp table. like
>>
>> sql("select TransactionDate, TransactionType, Description, Value,
>> Balance, AccountName, AccountNumber from tmp").take(2)
>>
>> replace those with your column names. they are mapped using case class
>>
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 15 June 2016 at 03:02, Lee Ho Yeung <jo...@gmail.com> wrote:
>>
>>> filter also has error
>>>
>>> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
>>> 4040. Attempting port 4041.
>>> Spark context available as sc.
>>> SQL context available as sqlContext.
>>>
>>> scala> import org.apache.spark.sql.SQLContext
>>> import org.apache.spark.sql.SQLContext
>>>
>>> scala> val sqlContext = new SQLContext(sc)
>>> sqlContext: org.apache.spark.sql.SQLContext =
>>> org.apache.spark.sql.SQLContext@3114ea
>>>
>>> scala> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
>>> UseCompressedOops is set; assuming yes
>>> Java HotSpot(TM) Client VM warning: You have loaded library
>>> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
>>> disabled stack guard. The VM will try to fix the stack guard now.
>>> It's highly recommended that you fix the library with 'execstack -c
>>> <libfile>', or link it with '-z noexecstack'.
>>> df: org.apache.spark.sql.DataFrame = [a0    a1    a2    a3    a4
>>> a5    a6    a7    a8    a9    : string]
>>>
>>> scala> df.printSchema()
>>> root
>>>  |-- a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    :
>>> string (nullable = true)
>>>
>>>
>>> scala> df.registerTempTable("sales")
>>>
>>> scala> df.filter($"a0".contains("found
>>> deep=1")).filter($"a1".contains("found
>>> deep=1")).filter($"a2".contains("found deep=1"))
>>> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
>>> columns: [a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    ];
>>>     at
>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jo...@gmail.com>
>>> wrote:
>>>
>>>> after tried following commands, can not show data
>>>>
>>>>
>>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>>>
>>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>>>
>>>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>>>> com.databricks:spark-csv_2.11:1.4.0
>>>>
>>>> import org.apache.spark.sql.SQLContext
>>>>
>>>> val sqlContext = new SQLContext(sc)
>>>> val df =
>>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>>> df.printSchema()
>>>> df.registerTempTable("sales")
>>>> val aggDF = sqlContext.sql("select * from sales where a0 like
>>>> \"%deep=3%\"")
>>>> df.collect.foreach(println)
>>>> aggDF.collect.foreach(println)
>>>>
>>>>
>>>>
>>>> val df =
>>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>>> "true").load("/home/martin/result002.csv")
>>>> df.printSchema()
>>>> df.registerTempTable("sales")
>>>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>>>
>>>
>>>
>>
>

Re: can not show all data for this table

Posted by Lee Ho Yeung <jo...@gmail.com>.
Hi Mich,

i find my problem cause now, i missed setting delimiter which is tab,

but it got error,

and i notice that only libre office and open and read well, even if Excel
in window, it still can not separate in well format

scala> val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").option("delimiter",
"").load("/home/martin/result002.csv")
java.lang.StringIndexOutOfBoundsException: String index out of range: 0


On Wed, Jun 15, 2016 at 12:14 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> there may be an issue with data in your csv file. like blank header line
> etc.
>
> sounds like you have an issue there. I normally get rid of blank lines
> before putting csv file in hdfs.
>
> can you actually select from that temp table. like
>
> sql("select TransactionDate, TransactionType, Description, Value, Balance,
> AccountName, AccountNumber from tmp").take(2)
>
> replace those with your column names. they are mapped using case class
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 June 2016 at 03:02, Lee Ho Yeung <jo...@gmail.com> wrote:
>
>> filter also has error
>>
>> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> Spark context available as sc.
>> SQL context available as sqlContext.
>>
>> scala> import org.apache.spark.sql.SQLContext
>> import org.apache.spark.sql.SQLContext
>>
>> scala> val sqlContext = new SQLContext(sc)
>> sqlContext: org.apache.spark.sql.SQLContext =
>> org.apache.spark.sql.SQLContext@3114ea
>>
>> scala> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> Java HotSpot(TM) Client VM warning: You have loaded library
>> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
>> disabled stack guard. The VM will try to fix the stack guard now.
>> It's highly recommended that you fix the library with 'execstack -c
>> <libfile>', or link it with '-z noexecstack'.
>> df: org.apache.spark.sql.DataFrame = [a0    a1    a2    a3    a4    a5
>> a6    a7    a8    a9    : string]
>>
>> scala> df.printSchema()
>> root
>>  |-- a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    : string
>> (nullable = true)
>>
>>
>> scala> df.registerTempTable("sales")
>>
>> scala> df.filter($"a0".contains("found
>> deep=1")).filter($"a1".contains("found
>> deep=1")).filter($"a2".contains("found deep=1"))
>> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
>> columns: [a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    ];
>>     at
>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>
>>
>>
>>
>>
>> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jo...@gmail.com>
>> wrote:
>>
>>> after tried following commands, can not show data
>>>
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>>
>>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>>> com.databricks:spark-csv_2.11:1.4.0
>>>
>>> import org.apache.spark.sql.SQLContext
>>>
>>> val sqlContext = new SQLContext(sc)
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> val aggDF = sqlContext.sql("select * from sales where a0 like
>>> \"%deep=3%\"")
>>> df.collect.foreach(println)
>>> aggDF.collect.foreach(println)
>>>
>>>
>>>
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>>
>>
>>
>

Re: can not show all data for this table

Posted by Lee Ho Yeung <jo...@gmail.com>.
Hi Mich,

https://drive.google.com/file/d/0Bxs_ao6uuBDUQ2NfYnhvUl9EZXM/view?usp=sharing
https://drive.google.com/file/d/0Bxs_ao6uuBDUS1UzTWd1Q2VJdEk/view?usp=sharing

this time I ensure headers cover all data, only some columns which have
headers do not have data

but still can not show all data like i open libre office

/home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
com.databricks:spark-csv_2.11:1.4.0
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").load("/home/martin/result002.csv")
df.printSchema()
df.registerTempTable("sales")
df.filter($"a3".contains("found deep=1"))





On Tue, Jun 14, 2016 at 9:14 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> there may be an issue with data in your csv file. like blank header line
> etc.
>
> sounds like you have an issue there. I normally get rid of blank lines
> before putting csv file in hdfs.
>
> can you actually select from that temp table. like
>
> sql("select TransactionDate, TransactionType, Description, Value, Balance,
> AccountName, AccountNumber from tmp").take(2)
>
> replace those with your column names. they are mapped using case class
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 June 2016 at 03:02, Lee Ho Yeung <jo...@gmail.com> wrote:
>
>> filter also has error
>>
>> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> Spark context available as sc.
>> SQL context available as sqlContext.
>>
>> scala> import org.apache.spark.sql.SQLContext
>> import org.apache.spark.sql.SQLContext
>>
>> scala> val sqlContext = new SQLContext(sc)
>> sqlContext: org.apache.spark.sql.SQLContext =
>> org.apache.spark.sql.SQLContext@3114ea
>>
>> scala> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> Java HotSpot(TM) Client VM warning: You have loaded library
>> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
>> disabled stack guard. The VM will try to fix the stack guard now.
>> It's highly recommended that you fix the library with 'execstack -c
>> <libfile>', or link it with '-z noexecstack'.
>> df: org.apache.spark.sql.DataFrame = [a0    a1    a2    a3    a4    a5
>> a6    a7    a8    a9    : string]
>>
>> scala> df.printSchema()
>> root
>>  |-- a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    : string
>> (nullable = true)
>>
>>
>> scala> df.registerTempTable("sales")
>>
>> scala> df.filter($"a0".contains("found
>> deep=1")).filter($"a1".contains("found
>> deep=1")).filter($"a2".contains("found deep=1"))
>> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
>> columns: [a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    ];
>>     at
>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>
>>
>>
>>
>>
>> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jo...@gmail.com>
>> wrote:
>>
>>> after tried following commands, can not show data
>>>
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>>
>>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>>> com.databricks:spark-csv_2.11:1.4.0
>>>
>>> import org.apache.spark.sql.SQLContext
>>>
>>> val sqlContext = new SQLContext(sc)
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> val aggDF = sqlContext.sql("select * from sales where a0 like
>>> \"%deep=3%\"")
>>> df.collect.foreach(println)
>>> aggDF.collect.foreach(println)
>>>
>>>
>>>
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>>
>>
>>
>

Re: can not show all data for this table

Posted by Mich Talebzadeh <mi...@gmail.com>.
there may be an issue with data in your csv file. like blank header line
etc.

sounds like you have an issue there. I normally get rid of blank lines
before putting csv file in hdfs.

can you actually select from that temp table. like

sql("select TransactionDate, TransactionType, Description, Value, Balance,
AccountName, AccountNumber from tmp").take(2)

replace those with your column names. they are mapped using case class


HTH




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 June 2016 at 03:02, Lee Ho Yeung <jo...@gmail.com> wrote:

> filter also has error
>
> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
> 4040. Attempting port 4041.
> Spark context available as sc.
> SQL context available as sqlContext.
>
> scala> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.SQLContext
>
> scala> val sqlContext = new SQLContext(sc)
> sqlContext: org.apache.spark.sql.SQLContext =
> org.apache.spark.sql.SQLContext@3114ea
>
> scala> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
> UseCompressedOops is set; assuming yes
> Java HotSpot(TM) Client VM warning: You have loaded library
> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
> disabled stack guard. The VM will try to fix the stack guard now.
> It's highly recommended that you fix the library with 'execstack -c
> <libfile>', or link it with '-z noexecstack'.
> df: org.apache.spark.sql.DataFrame = [a0    a1    a2    a3    a4    a5
> a6    a7    a8    a9    : string]
>
> scala> df.printSchema()
> root
>  |-- a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    : string
> (nullable = true)
>
>
> scala> df.registerTempTable("sales")
>
> scala> df.filter($"a0".contains("found
> deep=1")).filter($"a1".contains("found
> deep=1")).filter($"a2".contains("found deep=1"))
> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
> columns: [a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    ];
>     at
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>
>
>
>
>
> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jo...@gmail.com>
> wrote:
>
>> after tried following commands, can not show data
>>
>>
>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>
>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>
>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>> com.databricks:spark-csv_2.11:1.4.0
>>
>> import org.apache.spark.sql.SQLContext
>>
>> val sqlContext = new SQLContext(sc)
>> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>> df.printSchema()
>> df.registerTempTable("sales")
>> val aggDF = sqlContext.sql("select * from sales where a0 like
>> \"%deep=3%\"")
>> df.collect.foreach(println)
>> aggDF.collect.foreach(println)
>>
>>
>>
>> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").load("/home/martin/result002.csv")
>> df.printSchema()
>> df.registerTempTable("sales")
>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>
>
>

Re: can not show all data for this table

Posted by Lee Ho Yeung <jo...@gmail.com>.
filter also has error

16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
4040. Attempting port 4041.
Spark context available as sc.
SQL context available as sqlContext.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@3114ea

scala> val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").load("/home/martin/result002.csv")
16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
UseCompressedOops is set; assuming yes
Java HotSpot(TM) Client VM warning: You have loaded library
/tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c
<libfile>', or link it with '-z noexecstack'.
df: org.apache.spark.sql.DataFrame = [a0    a1    a2    a3    a4    a5
a6    a7    a8    a9    : string]

scala> df.printSchema()
root
 |-- a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    : string
(nullable = true)


scala> df.registerTempTable("sales")

scala> df.filter($"a0".contains("found
deep=1")).filter($"a1".contains("found
deep=1")).filter($"a2".contains("found deep=1"))
org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
columns: [a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    ];
    at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)




On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jo...@gmail.com> wrote:

> after tried following commands, can not show data
>
>
> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>
> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>
> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
> com.databricks:spark-csv_2.11:1.4.0
>
> import org.apache.spark.sql.SQLContext
>
> val sqlContext = new SQLContext(sc)
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
> df.printSchema()
> df.registerTempTable("sales")
> val aggDF = sqlContext.sql("select * from sales where a0 like
> \"%deep=3%\"")
> df.collect.foreach(println)
> aggDF.collect.foreach(println)
>
>
>
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").load("/home/martin/result002.csv")
> df.printSchema()
> df.registerTempTable("sales")
> sqlContext.sql("select * from sales").take(30).foreach(println)
>