You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ha...@tutanota.com on 2019/10/01 21:48:26 UTC
Convert a line of String into column
I want to convert a line of String to a table. For instance, I want to convert following line
<column1> <column2> <columns> ...<column6> # this is a line in a text file, separated by a white space
to table
+-----+------+----....+------+
|col1| col2| col3...|col6|
+-----+-----+-----....+-----+
|val1|val2|val3....|val6|
+-----+------+---.....+-----+
.....
The code looks as below
import org.apache.spark.sql.functions._ import org.apache.spark.sql.SparkSession val spark = SparkSession .builder .master("local") .appName("MyApp") .getOrCreate() import spark.implicits._ val lines = spark.readStream.textFile("/tmp/data/") val words = lines.as[String].flatMap(_.split(" ")) words.printSchema() val query = words. writeStream. outputMode("append"). format("console"). start query.awaitTermination()
But in fact this code only turns the line into a single column
+-------+
| value|
+-------+
|col1...|
|col2...|
| col3..|
| ... |
| col6 |
+------+
How to achieve the effect that I want to do?
Thanks?
Re: Convert a line of String into column
Posted by ayan guha <gu...@gmail.com>.
Do you know how many columns?
On Sat, Oct 5, 2019 at 6:39 PM Dhaval Modi <dh...@gmail.com> wrote:
> Hi,
>
> 1st convert "lines" to dataframe. You will get one column with original
> string in one row.
>
> Post this, use string split on this column to convert to Array of String.
>
> After This, you can use explode function to have each element of the array
> as columns.
>
> On Wed 2 Oct, 2019, 03:18 , <ha...@tutanota.com> wrote:
>
>> I want to convert a line of String to a table. For instance, I want to
>> convert following line
>>
>> <column1> <column2> <columns> ...<column6> # this is a line in a text
>> file, separated by a white space
>>
>> to table
>>
>> +-----+------+----....+------+
>> |col1| col2| col3...|col6|
>> +-----+-----+-----....+-----+
>> |val1|val2|val3....|val6|
>> +-----+------+---.....+-----+
>> .....
>>
>> The code looks as below
>>
>> import org.apache.spark.sql.functions._
>> import org.apache.spark.sql.SparkSession
>>
>> val spark = SparkSession
>> .builder
>> .master("local")
>> .appName("MyApp")
>> .getOrCreate()
>>
>> import spark.implicits._
>>
>> val lines = spark.readStream.textFile("/tmp/data/")
>>
>> val words = lines.as[String].flatMap(_.split(" "))
>> words.printSchema()
>>
>> val query = words.
>> writeStream.
>> outputMode("append").
>> format("console").
>> start
>> query.awaitTermination()
>>
>> But in fact this code only turns the line into a single column
>>
>> +-------+
>> | value|
>> +-------+
>> |col1...|
>> |col2...|
>> | col3..|
>> | ... |
>> | col6 |
>> +------+
>>
>> How to achieve the effect that I want to do?
>>
>> Thanks?
>>
>>
--
Best Regards,
Ayan Guha
Re: Convert a line of String into column
Posted by Dhaval Modi <dh...@gmail.com>.
Hi,
1st convert "lines" to dataframe. You will get one column with original
string in one row.
Post this, use string split on this column to convert to Array of String.
After This, you can use explode function to have each element of the array
as columns.
On Wed 2 Oct, 2019, 03:18 , <ha...@tutanota.com> wrote:
> I want to convert a line of String to a table. For instance, I want to
> convert following line
>
> <column1> <column2> <columns> ...<column6> # this is a line in a text
> file, separated by a white space
>
> to table
>
> +-----+------+----....+------+
> |col1| col2| col3...|col6|
> +-----+-----+-----....+-----+
> |val1|val2|val3....|val6|
> +-----+------+---.....+-----+
> .....
>
> The code looks as below
>
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.SparkSession
>
> val spark = SparkSession
> .builder
> .master("local")
> .appName("MyApp")
> .getOrCreate()
>
> import spark.implicits._
>
> val lines = spark.readStream.textFile("/tmp/data/")
>
> val words = lines.as[String].flatMap(_.split(" "))
> words.printSchema()
>
> val query = words.
> writeStream.
> outputMode("append").
> format("console").
> start
> query.awaitTermination()
>
> But in fact this code only turns the line into a single column
>
> +-------+
> | value|
> +-------+
> |col1...|
> |col2...|
> | col3..|
> | ... |
> | col6 |
> +------+
>
> How to achieve the effect that I want to do?
>
> Thanks?
>
>