You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by raheel-akl <ra...@gmail.com> on 2016/07/22 03:54:24 UTC

Applying schema on single column dataframe in java

Hi folks, 

I am reading lines from apache webserver log file into spark data frame. A
sample line from log file is below:

*piweba4y.prodigy.com - - [01/Aug/1995:00:00:10 -0400] "GET
/images/launchmedium.gif HTTP/1.0" 200 11853*

I have split the values into /host/, /timestamp/, /path/, /status/ and
/content_size/ and apply this as schema into new dataframe.

host: piweba4y.prodigy.com
timestamp: 01/Aug/1995:00:00:10 -0400
path: /images/launchmedium.gif
status: 200
content_size: 11853

I have done all above in python thru regular expressions and then have
applied the schema (5 columns above) as well and now would like to do the
same in java. But have no clue as how to do it? I am able to split the
values by applying reg-exp library in java. Next step is to create columns
(currently each line is a column named 'value') in my DF. Can someone help
as how to do this in java? It is much easy to do in python but java seems to
be little tough.

HELP!





-----



----
Raheel - (aspiring DS)
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Applying-schema-on-single-column-dataframe-in-java-tp27393.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org