You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ha...@tutanota.com on 2019/10/01 21:48:26 UTC

Convert a line of String into column

I want to convert a line of String to a table. For instance, I want to convert following line 

<column1> <column2> <columns> ...<column6> # this is a line in a text file, separated by a white space

to table 

+-----+------+----....+------+
|col1| col2| col3...|col6|
+-----+-----+-----....+-----+
|val1|val2|val3....|val6|
+-----+------+---.....+-----+
.....

The code looks as below

    import org.apache.spark.sql.functions._    import org.apache.spark.sql.SparkSession    val spark = SparkSession      .builder      .master("local")      .appName("MyApp")      .getOrCreate()    import spark.implicits._    val lines = spark.readStream.textFile("/tmp/data/")    val words = lines.as[String].flatMap(_.split(" "))    words.printSchema()    val query = words.      writeStream.      outputMode("append").      format("console").      start    query.awaitTermination()
But in fact this code only turns the line into a single column

+-------+
| value|
+-------+
|col1...|
|col2...|
| col3..|
|  ...     |
|  col6 |
+------+

How to achieve the effect that I want to do?

Thanks? 


Re: Convert a line of String into column

Posted by ayan guha <gu...@gmail.com>.
Do you know how many columns?

On Sat, Oct 5, 2019 at 6:39 PM Dhaval Modi <dh...@gmail.com> wrote:

> Hi,
>
> 1st convert  "lines"  to dataframe. You will get one column with original
> string in one row.
>
> Post this, use string split on this column to convert to Array of String.
>
> After This, you can use explode function to have each element of the array
> as columns.
>
> On Wed 2 Oct, 2019, 03:18 , <ha...@tutanota.com> wrote:
>
>> I want to convert a line of String to a table. For instance, I want to
>> convert following line
>>
>> <column1> <column2> <columns> ...<column6> # this is a line in a text
>> file, separated by a white space
>>
>> to table
>>
>> +-----+------+----....+------+
>> |col1| col2| col3...|col6|
>> +-----+-----+-----....+-----+
>> |val1|val2|val3....|val6|
>> +-----+------+---.....+-----+
>> .....
>>
>> The code looks as below
>>
>>     import org.apache.spark.sql.functions._
>>     import org.apache.spark.sql.SparkSession
>>
>>     val spark = SparkSession
>>       .builder
>>       .master("local")
>>       .appName("MyApp")
>>       .getOrCreate()
>>
>>     import spark.implicits._
>>
>>     val lines = spark.readStream.textFile("/tmp/data/")
>>
>>     val words = lines.as[String].flatMap(_.split(" "))
>>     words.printSchema()
>>
>>     val query = words.
>>       writeStream.
>>       outputMode("append").
>>       format("console").
>>       start
>>     query.awaitTermination()
>>
>> But in fact this code only turns the line into a single column
>>
>> +-------+
>> | value|
>> +-------+
>> |col1...|
>> |col2...|
>> | col3..|
>> |  ...     |
>> |  col6 |
>> +------+
>>
>> How to achieve the effect that I want to do?
>>
>> Thanks?
>>
>>

-- 
Best Regards,
Ayan Guha

Re: Convert a line of String into column

Posted by Dhaval Modi <dh...@gmail.com>.
Hi,

1st convert  "lines"  to dataframe. You will get one column with original
string in one row.

Post this, use string split on this column to convert to Array of String.

After This, you can use explode function to have each element of the array
as columns.

On Wed 2 Oct, 2019, 03:18 , <ha...@tutanota.com> wrote:

> I want to convert a line of String to a table. For instance, I want to
> convert following line
>
> <column1> <column2> <columns> ...<column6> # this is a line in a text
> file, separated by a white space
>
> to table
>
> +-----+------+----....+------+
> |col1| col2| col3...|col6|
> +-----+-----+-----....+-----+
> |val1|val2|val3....|val6|
> +-----+------+---.....+-----+
> .....
>
> The code looks as below
>
>     import org.apache.spark.sql.functions._
>     import org.apache.spark.sql.SparkSession
>
>     val spark = SparkSession
>       .builder
>       .master("local")
>       .appName("MyApp")
>       .getOrCreate()
>
>     import spark.implicits._
>
>     val lines = spark.readStream.textFile("/tmp/data/")
>
>     val words = lines.as[String].flatMap(_.split(" "))
>     words.printSchema()
>
>     val query = words.
>       writeStream.
>       outputMode("append").
>       format("console").
>       start
>     query.awaitTermination()
>
> But in fact this code only turns the line into a single column
>
> +-------+
> | value|
> +-------+
> |col1...|
> |col2...|
> | col3..|
> |  ...     |
> |  col6 |
> +------+
>
> How to achieve the effect that I want to do?
>
> Thanks?
>
>