You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Thijs Haarhuis <th...@oranggo.com> on 2019/02/13 14:01:42 UTC

SparkR + binary type + how to get value

Hi all,

Does anybody have any experience in accessing the data from a column which has a binary type in a Spark Data Frame in R?
I have a Spark Data Frame which has a column which is of a binary type. I want to access this data and process it.
In my case I collect the spark data frame to a R data frame and access the first row.
When I print this row to the console it does print all the hex values correctly.

However when I access the column it prints it is a list of 1 ...when I print the type of the child element..it again prints it is a list.
I expected this value to be of a raw type.

Anybody has some experience with this?

Thanks
Thijs


Re: SparkR + binary type + how to get value

Posted by Felix Cheung <fe...@hotmail.com>.
from the second image it looks like there is protocol mismatch. I’d check if the SparkR package running there on Livy machine matches the Spark java release.

But in any case this seems more an issue with Livy config. I’d suggest checking with the community there:



________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Tuesday, February 19, 2019 5:28 AM
To: Felix Cheung; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Hi Felix,

Thanks. I got it working now by using the unlist function.

I have another question, maybe you can help me with, since I did see your naming popping up regarding the spark.lapply function.
I am using Apache Livy and am having troubles using this function, I even reported a jira ticket for it at:
https://jira.apache.org/jira/browse/LIVY-558

When I call the spark.lapply function it reports that SparkR is not initialized.
I have looked into the spark.lapply function and it seems there is no spark context.
Any idea how I can debug this?

I hope you can help.

Regards,
Thijs

________________________________
From: Felix Cheung <fe...@hotmail.com>
Sent: Sunday, February 17, 2019 7:18 PM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

A byte buffer in R is the raw vector type, so seems like it is working as expected. What do you have in the raw byte? You could convert into other types or access individual byte directly...

https://stat.ethz.ch/R-manual/R-devel/library/base/html/raw.html


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Thursday, February 14, 2019 4:01 AM
To: Felix Cheung; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Hi Felix,
Sure..

I have the following code:

      printSchema(results)
      cat("\n\n\n")

      firstRow <- first(results)
      value <- firstRow$value

      cat(paste0("Value Type: '",typeof(value),"'\n\n\n"))
      cat(paste0("Value: '",value,"'\n\n\n"))

results is a Spark Data Frame here.

When I run this code the following is printed to console:

[cid:04497e3e-7983-488a-8516-5d2349778f03]

You can there is only a single column in this sdf of type binary
when I collect this value and print the type it prints it is a list.

Any idea how to get the actual value, or how to process the individual bytes?

Thanks
Thijs

________________________________
From: Felix Cheung <fe...@hotmail.com>
Sent: Thursday, February 14, 2019 5:31 AM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Please share your code


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value


Hi all,



Does anybody have any experience in accessing the data from a column which has a binary type in a Spark Data Frame in R?

I have a Spark Data Frame which has a column which is of a binary type. I want to access this data and process it.

In my case I collect the spark data frame to a R data frame and access the first row.

When I print this row to the console it does print all the hex values correctly.



However when I access the column it prints it is a list of 1 …when I print the type of the child element..it again prints it is a list.

I expected this value to be of a raw type.



Anybody has some experience with this?



Thanks

Thijs



Re: SparkR + binary type + how to get value

Posted by Thijs Haarhuis <th...@oranggo.com>.
Hi Felix,

Thanks. I got it working now by using the unlist function.

I have another question, maybe you can help me with, since I did see your naming popping up regarding the spark.lapply function.
I am using Apache Livy and am having troubles using this function, I even reported a jira ticket for it at:
https://jira.apache.org/jira/browse/LIVY-558

When I call the spark.lapply function it reports that SparkR is not initialized.
I have looked into the spark.lapply function and it seems there is no spark context.
Any idea how I can debug this?

I hope you can help.

Regards,
Thijs

________________________________
From: Felix Cheung <fe...@hotmail.com>
Sent: Sunday, February 17, 2019 7:18 PM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

A byte buffer in R is the raw vector type, so seems like it is working as expected. What do you have in the raw byte? You could convert into other types or access individual byte directly...

https://stat.ethz.ch/R-manual/R-devel/library/base/html/raw.html


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Thursday, February 14, 2019 4:01 AM
To: Felix Cheung; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Hi Felix,
Sure..

I have the following code:

      printSchema(results)
      cat("\n\n\n")

      firstRow <- first(results)
      value <- firstRow$value

      cat(paste0("Value Type: '",typeof(value),"'\n\n\n"))
      cat(paste0("Value: '",value,"'\n\n\n"))

results is a Spark Data Frame here.

When I run this code the following is printed to console:

[cid:04497e3e-7983-488a-8516-5d2349778f03]

You can there is only a single column in this sdf of type binary
when I collect this value and print the type it prints it is a list.

Any idea how to get the actual value, or how to process the individual bytes?

Thanks
Thijs

________________________________
From: Felix Cheung <fe...@hotmail.com>
Sent: Thursday, February 14, 2019 5:31 AM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Please share your code


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value


Hi all,



Does anybody have any experience in accessing the data from a column which has a binary type in a Spark Data Frame in R?

I have a Spark Data Frame which has a column which is of a binary type. I want to access this data and process it.

In my case I collect the spark data frame to a R data frame and access the first row.

When I print this row to the console it does print all the hex values correctly.



However when I access the column it prints it is a list of 1 …when I print the type of the child element..it again prints it is a list.

I expected this value to be of a raw type.



Anybody has some experience with this?



Thanks

Thijs



Re: SparkR + binary type + how to get value

Posted by Felix Cheung <fe...@hotmail.com>.
A byte buffer in R is the raw vector type, so seems like it is working as expected. What do you have in the raw byte? You could convert into other types or access individual byte directly...

https://stat.ethz.ch/R-manual/R-devel/library/base/html/raw.html


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Thursday, February 14, 2019 4:01 AM
To: Felix Cheung; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Hi Felix,
Sure..

I have the following code:

      printSchema(results)
      cat("\n\n\n")

      firstRow <- first(results)
      value <- firstRow$value

      cat(paste0("Value Type: '",typeof(value),"'\n\n\n"))
      cat(paste0("Value: '",value,"'\n\n\n"))

results is a Spark Data Frame here.

When I run this code the following is printed to console:

[cid:04497e3e-7983-488a-8516-5d2349778f03]

You can there is only a single column in this sdf of type binary
when I collect this value and print the type it prints it is a list.

Any idea how to get the actual value, or how to process the individual bytes?

Thanks
Thijs

________________________________
From: Felix Cheung <fe...@hotmail.com>
Sent: Thursday, February 14, 2019 5:31 AM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Please share your code


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value


Hi all,



Does anybody have any experience in accessing the data from a column which has a binary type in a Spark Data Frame in R?

I have a Spark Data Frame which has a column which is of a binary type. I want to access this data and process it.

In my case I collect the spark data frame to a R data frame and access the first row.

When I print this row to the console it does print all the hex values correctly.



However when I access the column it prints it is a list of 1 …when I print the type of the child element..it again prints it is a list.

I expected this value to be of a raw type.



Anybody has some experience with this?



Thanks

Thijs



Re: SparkR + binary type + how to get value

Posted by Thijs Haarhuis <th...@oranggo.com>.
Hi Felix,
Sure..

I have the following code:

      printSchema(results)
      cat("\n\n\n")

      firstRow <- first(results)
      value <- firstRow$value

      cat(paste0("Value Type: '",typeof(value),"'\n\n\n"))
      cat(paste0("Value: '",value,"'\n\n\n"))

results is a Spark Data Frame here.

When I run this code the following is printed to console:

[cid:04497e3e-7983-488a-8516-5d2349778f03]

You can there is only a single column in this sdf of type binary
when I collect this value and print the type it prints it is a list.

Any idea how to get the actual value, or how to process the individual bytes?

Thanks
Thijs

________________________________
From: Felix Cheung <fe...@hotmail.com>
Sent: Thursday, February 14, 2019 5:31 AM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Please share your code


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value


Hi all,



Does anybody have any experience in accessing the data from a column which has a binary type in a Spark Data Frame in R?

I have a Spark Data Frame which has a column which is of a binary type. I want to access this data and process it.

In my case I collect the spark data frame to a R data frame and access the first row.

When I print this row to the console it does print all the hex values correctly.



However when I access the column it prints it is a list of 1 …when I print the type of the child element..it again prints it is a list.

I expected this value to be of a raw type.



Anybody has some experience with this?



Thanks

Thijs



Re: SparkR + binary type + how to get value

Posted by Felix Cheung <fe...@hotmail.com>.
Please share your code


________________________________
From: Thijs Haarhuis <th...@oranggo.com>
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value

Hi all,

Does anybody have any experience in accessing the data from a column which has a binary type in a Spark Data Frame in R?
I have a Spark Data Frame which has a column which is of a binary type. I want to access this data and process it.
In my case I collect the spark data frame to a R data frame and access the first row.
When I print this row to the console it does print all the hex values correctly.

However when I access the column it prints it is a list of 1 …when I print the type of the child element..it again prints it is a list.
I expected this value to be of a raw type.

Anybody has some experience with this?

Thanks
Thijs