You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Eugene Morozov <ev...@gmail.com> on 2015/10/05 13:28:13 UTC

StructType has more rows, than corresponding Row has objects.

Hi,

We're building our own framework on top of spark and we give users pretty
complex schema to work with. That requires from us to build dataframes by
ourselves: we transform business objects to rows and struct types and uses
these two to create dataframe.

Everything was fine until I started to upgrade to spark 1.5.0 (from 1.3.1).
Seems to be catalyst engine has been changed and now using almost the same
code to produce rows and struct types I have the following:
http://ibin.co/2HzUsoe9O96l, some of rows in the end result have different
number of values and corresponding struct types.

I'm almost sure it's my own fault, but there is always a small chance, that
something is wrong in spark codebase. If you've seen something similar or
if there is a jira for smth similar, I'd be glad to know. Thanks.
--
Be well!
Jean Morozov

Re: StructType has more rows, than corresponding Row has objects.

Posted by Eugene Morozov <ev...@gmail.com>.

Davies,

that seemed to be my issue, my colleague helped me to resolved it. The
problem was that we build RDD<Row> and corresponding StructType by
ourselves (no json, parquet, cassandra, etc - we take a list of business
objects and convert them to Rows, then infer struct type) and I missed one
thing.
--
Be well!
Jean Morozov

On Tue, Oct 6, 2015 at 1:58 AM, Davies Liu <da...@databricks.com> wrote:

> Could you tell us a way to reproduce this failure? Reading from JSON or
> Parquet?
>
> On Mon, Oct 5, 2015 at 4:28 AM, Eugene Morozov
> <ev...@gmail.com> wrote:
> > Hi,
> >
> > We're building our own framework on top of spark and we give users pretty
> > complex schema to work with. That requires from us to build dataframes by
> > ourselves: we transform business objects to rows and struct types and
> uses
> > these two to create dataframe.
> >
> > Everything was fine until I started to upgrade to spark 1.5.0 (from
> 1.3.1).
> > Seems to be catalyst engine has been changed and now using almost the
> same
> > code to produce rows and struct types I have the following:
> > http://ibin.co/2HzUsoe9O96l, some of rows in the end result have
> different
> > number of values and corresponding struct types.
> >
> > I'm almost sure it's my own fault, but there is always a small chance,
> that
> > something is wrong in spark codebase. If you've seen something similar
> or if
> > there is a jira for smth similar, I'd be glad to know. Thanks.
> > --
> > Be well!
> > Jean Morozov
>

Re: StructType has more rows, than corresponding Row has objects.

Posted by Eugene Morozov <ev...@gmail.com>.

Davies,

that seemed to be my issue, my colleague helped me to resolved it. The
problem was that we build RDD<Row> and corresponding StructType by
ourselves (no json, parquet, cassandra, etc - we take a list of business
objects and convert them to Rows, then infer struct type) and I missed one
thing.
--
Be well!
Jean Morozov

On Tue, Oct 6, 2015 at 1:58 AM, Davies Liu <da...@databricks.com> wrote:

> Could you tell us a way to reproduce this failure? Reading from JSON or
> Parquet?
>
> On Mon, Oct 5, 2015 at 4:28 AM, Eugene Morozov
> <ev...@gmail.com> wrote:
> > Hi,
> >
> > We're building our own framework on top of spark and we give users pretty
> > complex schema to work with. That requires from us to build dataframes by
> > ourselves: we transform business objects to rows and struct types and
> uses
> > these two to create dataframe.
> >
> > Everything was fine until I started to upgrade to spark 1.5.0 (from
> 1.3.1).
> > Seems to be catalyst engine has been changed and now using almost the
> same
> > code to produce rows and struct types I have the following:
> > http://ibin.co/2HzUsoe9O96l, some of rows in the end result have
> different
> > number of values and corresponding struct types.
> >
> > I'm almost sure it's my own fault, but there is always a small chance,
> that
> > something is wrong in spark codebase. If you've seen something similar
> or if
> > there is a jira for smth similar, I'd be glad to know. Thanks.
> > --
> > Be well!
> > Jean Morozov
>

Re: StructType has more rows, than corresponding Row has objects.

Posted by Davies Liu <da...@databricks.com>.

Could you tell us a way to reproduce this failure? Reading from JSON or Parquet?

On Mon, Oct 5, 2015 at 4:28 AM, Eugene Morozov
<ev...@gmail.com> wrote:
> Hi,
>
> We're building our own framework on top of spark and we give users pretty
> complex schema to work with. That requires from us to build dataframes by
> ourselves: we transform business objects to rows and struct types and uses
> these two to create dataframe.
>
> Everything was fine until I started to upgrade to spark 1.5.0 (from 1.3.1).
> Seems to be catalyst engine has been changed and now using almost the same
> code to produce rows and struct types I have the following:
> http://ibin.co/2HzUsoe9O96l, some of rows in the end result have different
> number of values and corresponding struct types.
>
> I'm almost sure it's my own fault, but there is always a small chance, that
> something is wrong in spark codebase. If you've seen something similar or if
> there is a jira for smth similar, I'd be glad to know. Thanks.
> --
> Be well!
> Jean Morozov

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: StructType has more rows, than corresponding Row has objects.

Posted by Davies Liu <da...@databricks.com>.

Could you tell us a way to reproduce this failure? Reading from JSON or Parquet?

On Mon, Oct 5, 2015 at 4:28 AM, Eugene Morozov
<ev...@gmail.com> wrote:
> Hi,
>
> We're building our own framework on top of spark and we give users pretty
> complex schema to work with. That requires from us to build dataframes by
> ourselves: we transform business objects to rows and struct types and uses
> these two to create dataframe.
>
> Everything was fine until I started to upgrade to spark 1.5.0 (from 1.3.1).
> Seems to be catalyst engine has been changed and now using almost the same
> code to produce rows and struct types I have the following:
> http://ibin.co/2HzUsoe9O96l, some of rows in the end result have different
> number of values and corresponding struct types.
>
> I'm almost sure it's my own fault, but there is always a small chance, that
> something is wrong in spark codebase. If you've seen something similar or if
> there is a jira for smth similar, I'd be glad to know. Thanks.
> --
> Be well!
> Jean Morozov

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org