You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by vbegar <ve...@hpe.com> on 2017/02/13 22:54:32 UTC

How to specify default value for StructField?

Hello,

I specified a StructType like this: 

*val mySchema = StructType(Array(StructField("f1", StringType,
true),StructField("f2", StringType, true)))*

I have many ORC files stored in HDFS location:*
/user/hos/orc_files_test_together
*

These files use different schema : some of them have only f1 columns and
other have both f1 and f2 columns. 

I read the data from these files to a dataframe:
*val df =
spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_test_together")*

But, now when I give the following command to see the data, it fails:
*df.show*

The error message is like "f2" comun doesn't exist. 

Since I have specified nullable attribute as true for f2 column, why it
fails?

Or, is there any way to specify default vaule for StructField?

Because, in AVRO schema, we can specify the default value in this way and
can read AVRO files in a folder which have 2 different schemas (either only
f1 column or both f1 and f2 columns): 

*{
   "type": "record",
   "name": "myrecord",
   "fields": 
   [
      {
         "name": "f1",
         "type": "string",
         "default": ""
      },
      {
         "name": "f2",
         "type": "string",
         "default": ""
      }
   ]
}*

Wondering why it doesn't work with ORC files.

thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-default-value-for-StructField-tp28386.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

RE: How to specify default value for StructField?

Posted by "Begar, Veena" <ve...@hpe.com>.

Thanks Yan and Yong,

Yes, from Spark, I can access ORC files loaded to Hive tables.

Thanks.

From: 颜发才(Yan Facai) [mailto:facai.yan@gmail.com]
Sent: Friday, February 17, 2017 6:59 PM
To: Yong Zhang <ja...@hotmail.com>
Cc: Begar, Veena <ve...@hpe.com>; smartzjp <zj...@163.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

I agree with Yong Zhang,
perhaps spark sql with hive could solve the problem:

http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang <ja...@hotmail.com>> wrote:

If it works under hive, do you try just create the DF from Hive table directly in Spark? That should work, right?

Yong

________________________________
From: Begar, Veena <ve...@hpe.com>>
Sent: Wednesday, February 15, 2017 10:16 AM
To: Yong Zhang; smartzjp; user@spark.apache.org<ma...@spark.apache.org>

Subject: RE: How to specify default value for StructField?

Thanks Yong.

I know about merging the schema option.

Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also.

Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it using dataframe?

Thanks.

From: Yong Zhang [mailto:java8964@hotmail.com<ma...@hotmail.com>]
Sent: Tuesday, February 14, 2017 8:31 PM
To: Begar, Veena <ve...@hpe.com>>; smartzjp <zj...@163.com>>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: How to specify default value for StructField?

You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong.

You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [😊]

Yong

________________________________

From: Begar, Veena <ve...@hpe.com>>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org<ma...@spark.apache.org>
Subject: RE: How to specify default value for StructField?

Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1];

-----Original Message-----
From: smartzjp [mailto:zjp_jdev@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <ve...@hpe.com>>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show

在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163.com@spark.apache.org 代表 veena.begar@hpe.com<ma...@hpe.com>> 写入:

>Hello,
>
>I specified a StructType like this:
>
>*val mySchema = StructType(Array(StructField("f1", StringType,
>true),StructField("f2", StringType, true)))*
>
>I have many ORC files stored in HDFS location:*
>/user/hos/orc_files_test_together
>*
>
>These files use different schema : some of them have only f1 columns
>and other have both f1 and f2 columns.
>
>I read the data from these files to a dataframe:
>*val df =
>spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
>t_together")*
>
>But, now when I give the following command to see the data, it fails:
>*df.show*
>
>The error message is like "f2" comun doesn't exist.
>
>Since I have specified nullable attribute as true for f2 column, why it
>fails?
>
>Or, is there any way to specify default vaule for StructField?
>
>Because, in AVRO schema, we can specify the default value in this way
>and can read AVRO files in a folder which have 2 different schemas
>(either only
>f1 column or both f1 and f2 columns):
>
>*{
>   "type": "record",
>   "name": "myrecord",
>   "fields":
>   [
>      {
>         "name": "f1",
>         "type": "string",
>         "default": ""
>      },
>      {
>         "name": "f2",
>         "type": "string",
>         "default": ""
>      }
>   ]
>}*
>
>Wondering why it doesn't work with ORC files.
>
>thanks.
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
>ult-value-for-StructField-tp28386.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
>

Re: How to specify default value for StructField?

Posted by "颜发才 (Yan Facai)" <fa...@gmail.com>.

I agree with Yong Zhang,
perhaps spark sql with hive could solve the problem:

http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables




On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang <ja...@hotmail.com> wrote:

> If it works under hive, do you try just create the DF from Hive table
> directly in Spark? That should work, right?
>
>
> Yong
>
>
> ------------------------------
> *From:* Begar, Veena <ve...@hpe.com>
> *Sent:* Wednesday, February 15, 2017 10:16 AM
> *To:* Yong Zhang; smartzjp; user@spark.apache.org
>
> *Subject:* RE: How to specify default value for StructField?
>
>
> Thanks Yong.
>
>
>
> I know about merging the schema option.
>
> Using Hive we can read AVRO files having different schemas. And also we
> can do the same in Spark also.
>
> Similarly we can read ORC files having different schemas in Hive. But, we
> can’t do the same in Spark using dataframe. How we can do it using
> dataframe?
>
>
>
> Thanks.
>
> *From:* Yong Zhang [mailto:java8964@hotmail.com]
> *Sent:* Tuesday, February 14, 2017 8:31 PM
> *To:* Begar, Veena <ve...@hpe.com>; smartzjp <zj...@163.com>;
> user@spark.apache.org
> *Subject:* Re: How to specify default value for StructField?
>
>
>
> You maybe are looking for something like "spark.sql.parquet.mergeSchema"
> for ORC. Unfortunately, I don't think it is available, unless someone tells
> me I am wrong.
>
>
> You can create a JIRA to request this feature, but we all know that
> Parquet is the first citizen format [image: 😊]
>
>
>
> Yong
>
>
> ------------------------------
>
> *From:* Begar, Veena <ve...@hpe.com>
> *Sent:* Tuesday, February 14, 2017 10:37 AM
> *To:* smartzjp; user@spark.apache.org
> *Subject:* RE: How to specify default value for StructField?
>
>
>
> Thanks, it didn't work. Because, the folder has files from 2 different
> schemas.
> It fails with the following exception:
> org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input
> columns: [f1];
>
>
> -----Original Message-----
> From: smartzjp [mailto:zjp_jdev@163.com <zj...@163.com>]
> Sent: Tuesday, February 14, 2017 10:32 AM
> To: Begar, Veena <ve...@hpe.com>; user@spark.apache.org
> Subject: Re: How to specify default value for StructField?
>
> You can try the below code.
>
> val df = spark.read.format("orc").load("/user/hos/orc_files_test_
> together")
> df.select(“f1”,”f2”).show
>
>
>
>
>
> 在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163.com@spark.apache.org
> 代表 veena.begar@hpe.com
> <us...@hpe.com>>
> 写入:
>
> >Hello,
> >
> >I specified a StructType like this:
> >
> >*val mySchema = StructType(Array(StructField("f1", StringType,
> >true),StructField("f2", StringType, true)))*
> >
> >I have many ORC files stored in HDFS location:*
> >/user/hos/orc_files_test_together
> >*
> >
> >These files use different schema : some of them have only f1 columns
> >and other have both f1 and f2 columns.
> >
> >I read the data from these files to a dataframe:
> >*val df =
> >spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
> >t_together")*
> >
> >But, now when I give the following command to see the data, it fails:
> >*df.show*
> >
> >The error message is like "f2" comun doesn't exist.
> >
> >Since I have specified nullable attribute as true for f2 column, why it
> >fails?
> >
> >Or, is there any way to specify default vaule for StructField?
> >
> >Because, in AVRO schema, we can specify the default value in this way
> >and can read AVRO files in a folder which have 2 different schemas
> >(either only
> >f1 column or both f1 and f2 columns):
> >
> >*{
> >   "type": "record",
> >   "name": "myrecord",
> >   "fields":
> >   [
> >      {
> >         "name": "f1",
> >         "type": "string",
> >         "default": ""
> >      },
> >      {
> >         "name": "f2",
> >         "type": "string",
> >         "default": ""
> >      }
> >   ]
> >}*
> >
> >Wondering why it doesn't work with ORC files.
> >
> >thanks.
> >
> >
> >
> >--
> >View this message in context:
> >http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
> >ult-value-for-StructField-tp28386.html
> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> >---------------------------------------------------------------------
> >To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
>
>

Re: How to specify default value for StructField?

Posted by Yong Zhang <ja...@hotmail.com>.

If it works under hive, do you try just create the DF from Hive table directly in Spark? That should work, right?

Yong

________________________________
From: Begar, Veena <ve...@hpe.com>
Sent: Wednesday, February 15, 2017 10:16 AM
To: Yong Zhang; smartzjp; user@spark.apache.org
Subject: RE: How to specify default value for StructField?

Thanks Yong.

I know about merging the schema option.

Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also.

Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it using dataframe?

Thanks.

From: Yong Zhang [mailto:java8964@hotmail.com]
Sent: Tuesday, February 14, 2017 8:31 PM
To: Begar, Veena <ve...@hpe.com>; smartzjp <zj...@163.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong.

You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [??]

Yong

________________________________

From: Begar, Veena <ve...@hpe.com>>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org<ma...@spark.apache.org>
Subject: RE: How to specify default value for StructField?

Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1];

-----Original Message-----
From: smartzjp [mailto:zjp_jdev@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <ve...@hpe.com>>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show

在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163.com@spark.apache.org 代表 veena.begar@hpe.com<ma...@hpe.com>> 写入:

>Hello,
>
>I specified a StructType like this:
>
>*val mySchema = StructType(Array(StructField("f1", StringType,
>true),StructField("f2", StringType, true)))*
>
>I have many ORC files stored in HDFS location:*
>/user/hos/orc_files_test_together
>*
>
>These files use different schema : some of them have only f1 columns
>and other have both f1 and f2 columns.
>
>I read the data from these files to a dataframe:
>*val df =
>spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
>t_together")*
>
>But, now when I give the following command to see the data, it fails:
>*df.show*
>
>The error message is like "f2" comun doesn't exist.
>
>Since I have specified nullable attribute as true for f2 column, why it
>fails?
>
>Or, is there any way to specify default vaule for StructField?
>
>Because, in AVRO schema, we can specify the default value in this way
>and can read AVRO files in a folder which have 2 different schemas
>(either only
>f1 column or both f1 and f2 columns):
>
>*{
>   "type": "record",
>   "name": "myrecord",
>   "fields":
>   [
>      {
>         "name": "f1",
>         "type": "string",
>         "default": ""
>      },
>      {
>         "name": "f2",
>         "type": "string",
>         "default": ""
>      }
>   ]
>}*
>
>Wondering why it doesn't work with ORC files.
>
>thanks.
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
>ult-value-for-StructField-tp28386.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
>

RE: How to specify default value for StructField?

Posted by "Begar, Veena" <ve...@hpe.com>.

Thanks Yong.

I know about merging the schema option.
Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also.
Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it using dataframe?

Thanks.
From: Yong Zhang [mailto:java8964@hotmail.com]
Sent: Tuesday, February 14, 2017 8:31 PM
To: Begar, Veena <ve...@hpe.com>; smartzjp <zj...@163.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong.

You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [😊]

Yong

________________________________
From: Begar, Veena <ve...@hpe.com>>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org<ma...@spark.apache.org>
Subject: RE: How to specify default value for StructField?

Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1];

-----Original Message-----
From: smartzjp [mailto:zjp_jdev@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <ve...@hpe.com>>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show

在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163.com@spark.apache.org 代表 veena.begar@hpe.com<ma...@hpe.com>> 写入:

>Hello,
>
>I specified a StructType like this:
>
>*val mySchema = StructType(Array(StructField("f1", StringType,
>true),StructField("f2", StringType, true)))*
>
>I have many ORC files stored in HDFS location:*
>/user/hos/orc_files_test_together
>*
>
>These files use different schema : some of them have only f1 columns
>and other have both f1 and f2 columns.
>
>I read the data from these files to a dataframe:
>*val df =
>spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
>t_together")*
>
>But, now when I give the following command to see the data, it fails:
>*df.show*
>
>The error message is like "f2" comun doesn't exist.
>
>Since I have specified nullable attribute as true for f2 column, why it
>fails?
>
>Or, is there any way to specify default vaule for StructField?
>
>Because, in AVRO schema, we can specify the default value in this way
>and can read AVRO files in a folder which have 2 different schemas
>(either only
>f1 column or both f1 and f2 columns):
>
>*{
>   "type": "record",
>   "name": "myrecord",
>   "fields":
>   [
>      {
>         "name": "f1",
>         "type": "string",
>         "default": ""
>      },
>      {
>         "name": "f2",
>         "type": "string",
>         "default": ""
>      }
>   ]
>}*
>
>Wondering why it doesn't work with ORC files.
>
>thanks.
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
>ult-value-for-StructField-tp28386.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
>

Re: How to specify default value for StructField?

Posted by Yong Zhang <ja...@hotmail.com>.

You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong.

You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [😊]

Yong

________________________________
From: Begar, Veena <ve...@hpe.com>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org
Subject: RE: How to specify default value for StructField?

Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1];


-----Original Message-----
From: smartzjp [mailto:zjp_jdev@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <ve...@hpe.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163.com@spark.apache.org 代表 veena.begar@hpe.com> 写入:

>Hello,
>
>I specified a StructType like this:
>
>*val mySchema = StructType(Array(StructField("f1", StringType,
>true),StructField("f2", StringType, true)))*
>
>I have many ORC files stored in HDFS location:*
>/user/hos/orc_files_test_together
>*
>
>These files use different schema : some of them have only f1 columns
>and other have both f1 and f2 columns.
>
>I read the data from these files to a dataframe:
>*val df =
>spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
>t_together")*
>
>But, now when I give the following command to see the data, it fails:
>*df.show*
>
>The error message is like "f2" comun doesn't exist.
>
>Since I have specified nullable attribute as true for f2 column, why it
>fails?
>
>Or, is there any way to specify default vaule for StructField?
>
>Because, in AVRO schema, we can specify the default value in this way
>and can read AVRO files in a folder which have 2 different schemas
>(either only
>f1 column or both f1 and f2 columns):
>
>*{
>   "type": "record",
>   "name": "myrecord",
>   "fields":
>   [
>      {
>         "name": "f1",
>         "type": "string",
>         "default": ""
>      },
>      {
>         "name": "f2",
>         "type": "string",
>         "default": ""
>      }
>   ]
>}*
>
>Wondering why it doesn't work with ORC files.
>
>thanks.
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
>ult-value-for-StructField-tp28386.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>

RE: How to specify default value for StructField?

Posted by "Begar, Veena" <ve...@hpe.com>.

Thanks, it didn't work. Because, the folder has files from 2 different schemas. 
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1];


-----Original Message-----
From: smartzjp [mailto:zjp_jdev@163.com] 
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <ve...@hpe.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163.com@spark.apache.org 代表 veena.begar@hpe.com> 写入:

>Hello,
>
>I specified a StructType like this: 
>
>*val mySchema = StructType(Array(StructField("f1", StringType, 
>true),StructField("f2", StringType, true)))*
>
>I have many ORC files stored in HDFS location:* 
>/user/hos/orc_files_test_together
>*
>
>These files use different schema : some of them have only f1 columns 
>and other have both f1 and f2 columns.
>
>I read the data from these files to a dataframe:
>*val df =
>spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
>t_together")*
>
>But, now when I give the following command to see the data, it fails:
>*df.show*
>
>The error message is like "f2" comun doesn't exist. 
>
>Since I have specified nullable attribute as true for f2 column, why it 
>fails?
>
>Or, is there any way to specify default vaule for StructField?
>
>Because, in AVRO schema, we can specify the default value in this way 
>and can read AVRO files in a folder which have 2 different schemas 
>(either only
>f1 column or both f1 and f2 columns): 
>
>*{
>   "type": "record",
>   "name": "myrecord",
>   "fields": 
>   [
>      {
>         "name": "f1",
>         "type": "string",
>         "default": ""
>      },
>      {
>         "name": "f2",
>         "type": "string",
>         "default": ""
>      }
>   ]
>}*
>
>Wondering why it doesn't work with ORC files.
>
>thanks.
>
>
>
>--
>View this message in context: 
>http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
>ult-value-for-StructField-tp28386.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>

Re: How to specify default value for StructField?

Posted by smartzjp <zj...@163.com>.

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163.com@spark.apache.org 代表 veena.begar@hpe.com> 写入:

>Hello,
>
>I specified a StructType like this: 
>
>*val mySchema = StructType(Array(StructField("f1", StringType,
>true),StructField("f2", StringType, true)))*
>
>I have many ORC files stored in HDFS location:*
>/user/hos/orc_files_test_together
>*
>
>These files use different schema : some of them have only f1 columns and
>other have both f1 and f2 columns. 
>
>I read the data from these files to a dataframe:
>*val df =
>spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_test_together")*
>
>But, now when I give the following command to see the data, it fails:
>*df.show*
>
>The error message is like "f2" comun doesn't exist. 
>
>Since I have specified nullable attribute as true for f2 column, why it
>fails?
>
>Or, is there any way to specify default vaule for StructField?
>
>Because, in AVRO schema, we can specify the default value in this way and
>can read AVRO files in a folder which have 2 different schemas (either only
>f1 column or both f1 and f2 columns): 
>
>*{
>   "type": "record",
>   "name": "myrecord",
>   "fields": 
>   [
>      {
>         "name": "f1",
>         "type": "string",
>         "default": ""
>      },
>      {
>         "name": "f2",
>         "type": "string",
>         "default": ""
>      }
>   ]
>}*
>
>Wondering why it doesn't work with ORC files.
>
>thanks.
>
>
>
>--
>View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-default-value-for-StructField-tp28386.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org