You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Stefán Baxter <st...@activitystream.com> on 2015/11/10 15:25:45 UTC

Avro deserialization bug - 1.3-SNAPSHOT

Hi,

I have an Avro file that support the following data/schema:

{"field":"some", "classification":{"variant":"Gæst"}}

When I select 10 rows from this file I get:

+---------------------+
|       EXPR$0        |
+---------------------+
| Gæst                |
| Voksen              |
| Voksen              |
| Invitation KIF KBH  |
| Invitation KIF KBH  |
| Ordinarie pris KBH  |
| Ordinarie pris KBH  |
| Biljetter 200 krBH  |
| Biljetter 200 krBH  |
| Biljetter 200 krBH  |
+---------------------+

The bug is that the field values are incorrectly de-serialized and the
value from the previous row is retained if the subsequent row is shorter.

The sql query:

"select s.classification.variant variant from dfs.<some> as s limit 10;"


That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
previous row had the value "Invitation KIF KBH".

Regards,
  -Stefán

Fwd: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Stefán Baxter <st...@activitystream.com>.
Hi,

Decided to send this to dev* as well.

Can someone please assist me with this problem of Drill distorting string
values that are read from Avro files.

Regards,
 -Stefan

---------- Forwarded message ----------
From: Stefán Baxter <st...@activitystream.com>
Date: Wed, Nov 11, 2015 at 10:14 PM
Subject: Re: Avro deserialization bug - 1.3-SNAPSHOT
To: user <us...@drill.apache.org>


Hi,

Can someone please verify that this is in fact a bug so I can rule out our
own mistakes?

We have recently moved all our logging to Avro to compensate for schema
differences in JSON that were causing various problems and our latest
release is now impeded with this.
Alternatively can someone please point me in the right direction if I was
to try to fix this myself.

Regards,
  -Stefán

On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> Thank you Kamesh.
>
> I have created https://issues.apache.org/jira/browse/DRILL-4056 with the
> description.
> I will send you a confidential test file to your private email.
>
> Regards,
>  -Stefan
>
> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com> wrote:
>
>> Hi Stefán,
>>  Could you please raise a Jira with sample schema and sample input to
>> reproduce it. I will look into this.
>>
>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <stefan@activitystream.com
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > I have an Avro file that support the following data/schema:
>> >
>> > {"field":"some", "classification":{"variant":"Gæst"}}
>> >
>> > When I select 10 rows from this file I get:
>> >
>> > +---------------------+
>> > |       EXPR$0        |
>> > +---------------------+
>> > | Gæst                |
>> > | Voksen              |
>> > | Voksen              |
>> > | Invitation KIF KBH  |
>> > | Invitation KIF KBH  |
>> > | Ordinarie pris KBH  |
>> > | Ordinarie pris KBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > +---------------------+
>> >
>> > The bug is that the field values are incorrectly de-serialized and the
>> > value from the previous row is retained if the subsequent row is
>> shorter.
>> >
>> > The sql query:
>> >
>> > "select s.classification.variant variant from dfs.<some> as s limit 10;"
>> >
>> >
>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
>> > previous row had the value "Invitation KIF KBH".
>> >
>> > Regards,
>> >   -Stefán
>> >
>>
>>
>>
>> --
>> Kamesh.
>>
>
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Stefán Baxter <st...@activitystream.com>.
Thank you Jason!

I will report back as soon as I have tried this.

On Fri, Nov 13, 2015 at 11:56 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Stefan,
>
> I took a look at the issue and I think I have a fix for the corruption you
> are seeing. There have been a number of substantial commits to master
> including a refactoring of a number of modules, so I applied this change on
> top of the 1.3 branch for you to build and try out. I would like to add
> some additional test cases, at which point I will open up and official PR
> against master and we will likely be able to pull it back onto the 1.3
> branch for inclusion in the release.
>
> Please try this out to see if there are remaining issues reading your data.
>
> https://github.com/jaltekruse/incubator-drill/tree/4056-avro-corruption-bug
>
> Thanks,
> Jason
>
>
>
> On Fri, Nov 13, 2015 at 2:58 PM, Stefán Baxter <st...@activitystream.com>
> wrote:
>
> > So,
> >
> > Could someone point me to the appropriate place in the Drill code to
> start
> > investigating this (We would love to contribute but getting up to speed
> is
> > a bit much).
> >
> > I realize that there are many good things happening and that v. 1.3 is
> > around the corner but it seems that I incorrectly assumed that data
> > corruption issues would get a higher priority or that I would, at the
> very
> > least, get someone to confirm such a bug.
> >
> > We are now impeded by this after having moved all our logging from JSON
> to
> > Avro to avoid the schema related problems we have been running into with
> > the JSON reader (null interpreted like double and failing when a string
> > eventually comes along) .
> >
> > - Stefan
> >
> >
> > On Wed, Nov 11, 2015 at 10:14 PM, Stefán Baxter <
> stefan@activitystream.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Can someone please verify that this is in fact a bug so I can rule out
> > our
> > > own mistakes?
> > >
> > > We have recently moved all our logging to Avro to compensate for schema
> > > differences in JSON that were causing various problems and our latest
> > > release is now impeded with this.
> > > Alternatively can someone please point me in the right direction if I
> was
> > > to try to fix this myself.
> > >
> > > Regards,
> > >   -Stefán
> > >
> > > On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <
> > stefan@activitystream.com>
> > > wrote:
> > >
> > >> Thank you Kamesh.
> > >>
> > >> I have created https://issues.apache.org/jira/browse/DRILL-4056 with
> > the
> > >> description.
> > >> I will send you a confidential test file to your private email.
> > >>
> > >> Regards,
> > >>  -Stefan
> > >>
> > >> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com>
> > wrote:
> > >>
> > >>> Hi Stefán,
> > >>>  Could you please raise a Jira with sample schema and sample input to
> > >>> reproduce it. I will look into this.
> > >>>
> > >>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <
> > >>> stefan@activitystream.com>
> > >>> wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> > I have an Avro file that support the following data/schema:
> > >>> >
> > >>> > {"field":"some", "classification":{"variant":"Gæst"}}
> > >>> >
> > >>> > When I select 10 rows from this file I get:
> > >>> >
> > >>> > +---------------------+
> > >>> > |       EXPR$0        |
> > >>> > +---------------------+
> > >>> > | Gæst                |
> > >>> > | Voksen              |
> > >>> > | Voksen              |
> > >>> > | Invitation KIF KBH  |
> > >>> > | Invitation KIF KBH  |
> > >>> > | Ordinarie pris KBH  |
> > >>> > | Ordinarie pris KBH  |
> > >>> > | Biljetter 200 krBH  |
> > >>> > | Biljetter 200 krBH  |
> > >>> > | Biljetter 200 krBH  |
> > >>> > +---------------------+
> > >>> >
> > >>> > The bug is that the field values are incorrectly de-serialized and
> > the
> > >>> > value from the previous row is retained if the subsequent row is
> > >>> shorter.
> > >>> >
> > >>> > The sql query:
> > >>> >
> > >>> > "select s.classification.variant variant from dfs.<some> as s limit
> > >>> 10;"
> > >>> >
> > >>> >
> > >>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because
> > the
> > >>> > previous row had the value "Invitation KIF KBH".
> > >>> >
> > >>> > Regards,
> > >>> >   -Stefán
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Kamesh.
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Jason Altekruse <al...@gmail.com>.
Stefan,

I took a look at the issue and I think I have a fix for the corruption you
are seeing. There have been a number of substantial commits to master
including a refactoring of a number of modules, so I applied this change on
top of the 1.3 branch for you to build and try out. I would like to add
some additional test cases, at which point I will open up and official PR
against master and we will likely be able to pull it back onto the 1.3
branch for inclusion in the release.

Please try this out to see if there are remaining issues reading your data.

https://github.com/jaltekruse/incubator-drill/tree/4056-avro-corruption-bug

Thanks,
Jason



On Fri, Nov 13, 2015 at 2:58 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> So,
>
> Could someone point me to the appropriate place in the Drill code to start
> investigating this (We would love to contribute but getting up to speed is
> a bit much).
>
> I realize that there are many good things happening and that v. 1.3 is
> around the corner but it seems that I incorrectly assumed that data
> corruption issues would get a higher priority or that I would, at the very
> least, get someone to confirm such a bug.
>
> We are now impeded by this after having moved all our logging from JSON to
> Avro to avoid the schema related problems we have been running into with
> the JSON reader (null interpreted like double and failing when a string
> eventually comes along) .
>
> - Stefan
>
>
> On Wed, Nov 11, 2015 at 10:14 PM, Stefán Baxter <stefan@activitystream.com
> >
> wrote:
>
> > Hi,
> >
> > Can someone please verify that this is in fact a bug so I can rule out
> our
> > own mistakes?
> >
> > We have recently moved all our logging to Avro to compensate for schema
> > differences in JSON that were causing various problems and our latest
> > release is now impeded with this.
> > Alternatively can someone please point me in the right direction if I was
> > to try to fix this myself.
> >
> > Regards,
> >   -Stefán
> >
> > On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <
> stefan@activitystream.com>
> > wrote:
> >
> >> Thank you Kamesh.
> >>
> >> I have created https://issues.apache.org/jira/browse/DRILL-4056 with
> the
> >> description.
> >> I will send you a confidential test file to your private email.
> >>
> >> Regards,
> >>  -Stefan
> >>
> >> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com>
> wrote:
> >>
> >>> Hi Stefán,
> >>>  Could you please raise a Jira with sample schema and sample input to
> >>> reproduce it. I will look into this.
> >>>
> >>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <
> >>> stefan@activitystream.com>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > I have an Avro file that support the following data/schema:
> >>> >
> >>> > {"field":"some", "classification":{"variant":"Gæst"}}
> >>> >
> >>> > When I select 10 rows from this file I get:
> >>> >
> >>> > +---------------------+
> >>> > |       EXPR$0        |
> >>> > +---------------------+
> >>> > | Gæst                |
> >>> > | Voksen              |
> >>> > | Voksen              |
> >>> > | Invitation KIF KBH  |
> >>> > | Invitation KIF KBH  |
> >>> > | Ordinarie pris KBH  |
> >>> > | Ordinarie pris KBH  |
> >>> > | Biljetter 200 krBH  |
> >>> > | Biljetter 200 krBH  |
> >>> > | Biljetter 200 krBH  |
> >>> > +---------------------+
> >>> >
> >>> > The bug is that the field values are incorrectly de-serialized and
> the
> >>> > value from the previous row is retained if the subsequent row is
> >>> shorter.
> >>> >
> >>> > The sql query:
> >>> >
> >>> > "select s.classification.variant variant from dfs.<some> as s limit
> >>> 10;"
> >>> >
> >>> >
> >>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because
> the
> >>> > previous row had the value "Invitation KIF KBH".
> >>> >
> >>> > Regards,
> >>> >   -Stefán
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Kamesh.
> >>>
> >>
> >>
> >
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Jason Altekruse <al...@gmail.com>.
Stefan,

I took a look at the issue and I think I have a fix for the corruption you
are seeing. There have been a number of substantial commits to master
including a refactoring of a number of modules, so I applied this change on
top of the 1.3 branch for you to build and try out. I would like to add
some additional test cases, at which point I will open up and official PR
against master and we will likely be able to pull it back onto the 1.3
branch for inclusion in the release.

Please try this out to see if there are remaining issues reading your data.

https://github.com/jaltekruse/incubator-drill/tree/4056-avro-corruption-bug

Thanks,
Jason



On Fri, Nov 13, 2015 at 2:58 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> So,
>
> Could someone point me to the appropriate place in the Drill code to start
> investigating this (We would love to contribute but getting up to speed is
> a bit much).
>
> I realize that there are many good things happening and that v. 1.3 is
> around the corner but it seems that I incorrectly assumed that data
> corruption issues would get a higher priority or that I would, at the very
> least, get someone to confirm such a bug.
>
> We are now impeded by this after having moved all our logging from JSON to
> Avro to avoid the schema related problems we have been running into with
> the JSON reader (null interpreted like double and failing when a string
> eventually comes along) .
>
> - Stefan
>
>
> On Wed, Nov 11, 2015 at 10:14 PM, Stefán Baxter <stefan@activitystream.com
> >
> wrote:
>
> > Hi,
> >
> > Can someone please verify that this is in fact a bug so I can rule out
> our
> > own mistakes?
> >
> > We have recently moved all our logging to Avro to compensate for schema
> > differences in JSON that were causing various problems and our latest
> > release is now impeded with this.
> > Alternatively can someone please point me in the right direction if I was
> > to try to fix this myself.
> >
> > Regards,
> >   -Stefán
> >
> > On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <
> stefan@activitystream.com>
> > wrote:
> >
> >> Thank you Kamesh.
> >>
> >> I have created https://issues.apache.org/jira/browse/DRILL-4056 with
> the
> >> description.
> >> I will send you a confidential test file to your private email.
> >>
> >> Regards,
> >>  -Stefan
> >>
> >> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com>
> wrote:
> >>
> >>> Hi Stefán,
> >>>  Could you please raise a Jira with sample schema and sample input to
> >>> reproduce it. I will look into this.
> >>>
> >>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <
> >>> stefan@activitystream.com>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > I have an Avro file that support the following data/schema:
> >>> >
> >>> > {"field":"some", "classification":{"variant":"Gæst"}}
> >>> >
> >>> > When I select 10 rows from this file I get:
> >>> >
> >>> > +---------------------+
> >>> > |       EXPR$0        |
> >>> > +---------------------+
> >>> > | Gæst                |
> >>> > | Voksen              |
> >>> > | Voksen              |
> >>> > | Invitation KIF KBH  |
> >>> > | Invitation KIF KBH  |
> >>> > | Ordinarie pris KBH  |
> >>> > | Ordinarie pris KBH  |
> >>> > | Biljetter 200 krBH  |
> >>> > | Biljetter 200 krBH  |
> >>> > | Biljetter 200 krBH  |
> >>> > +---------------------+
> >>> >
> >>> > The bug is that the field values are incorrectly de-serialized and
> the
> >>> > value from the previous row is retained if the subsequent row is
> >>> shorter.
> >>> >
> >>> > The sql query:
> >>> >
> >>> > "select s.classification.variant variant from dfs.<some> as s limit
> >>> 10;"
> >>> >
> >>> >
> >>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because
> the
> >>> > previous row had the value "Invitation KIF KBH".
> >>> >
> >>> > Regards,
> >>> >   -Stefán
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Kamesh.
> >>>
> >>
> >>
> >
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Stefán Baxter <st...@activitystream.com>.
So,

Could someone point me to the appropriate place in the Drill code to start
investigating this (We would love to contribute but getting up to speed is
a bit much).

I realize that there are many good things happening and that v. 1.3 is
around the corner but it seems that I incorrectly assumed that data
corruption issues would get a higher priority or that I would, at the very
least, get someone to confirm such a bug.

We are now impeded by this after having moved all our logging from JSON to
Avro to avoid the schema related problems we have been running into with
the JSON reader (null interpreted like double and failing when a string
eventually comes along) .

- Stefan


On Wed, Nov 11, 2015 at 10:14 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> Hi,
>
> Can someone please verify that this is in fact a bug so I can rule out our
> own mistakes?
>
> We have recently moved all our logging to Avro to compensate for schema
> differences in JSON that were causing various problems and our latest
> release is now impeded with this.
> Alternatively can someone please point me in the right direction if I was
> to try to fix this myself.
>
> Regards,
>   -Stefán
>
> On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <st...@activitystream.com>
> wrote:
>
>> Thank you Kamesh.
>>
>> I have created https://issues.apache.org/jira/browse/DRILL-4056 with the
>> description.
>> I will send you a confidential test file to your private email.
>>
>> Regards,
>>  -Stefan
>>
>> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com> wrote:
>>
>>> Hi Stefán,
>>>  Could you please raise a Jira with sample schema and sample input to
>>> reproduce it. I will look into this.
>>>
>>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <
>>> stefan@activitystream.com>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > I have an Avro file that support the following data/schema:
>>> >
>>> > {"field":"some", "classification":{"variant":"Gæst"}}
>>> >
>>> > When I select 10 rows from this file I get:
>>> >
>>> > +---------------------+
>>> > |       EXPR$0        |
>>> > +---------------------+
>>> > | Gæst                |
>>> > | Voksen              |
>>> > | Voksen              |
>>> > | Invitation KIF KBH  |
>>> > | Invitation KIF KBH  |
>>> > | Ordinarie pris KBH  |
>>> > | Ordinarie pris KBH  |
>>> > | Biljetter 200 krBH  |
>>> > | Biljetter 200 krBH  |
>>> > | Biljetter 200 krBH  |
>>> > +---------------------+
>>> >
>>> > The bug is that the field values are incorrectly de-serialized and the
>>> > value from the previous row is retained if the subsequent row is
>>> shorter.
>>> >
>>> > The sql query:
>>> >
>>> > "select s.classification.variant variant from dfs.<some> as s limit
>>> 10;"
>>> >
>>> >
>>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
>>> > previous row had the value "Invitation KIF KBH".
>>> >
>>> > Regards,
>>> >   -Stefán
>>> >
>>>
>>>
>>>
>>> --
>>> Kamesh.
>>>
>>
>>
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Stefán Baxter <st...@activitystream.com>.
So,

Could someone point me to the appropriate place in the Drill code to start
investigating this (We would love to contribute but getting up to speed is
a bit much).

I realize that there are many good things happening and that v. 1.3 is
around the corner but it seems that I incorrectly assumed that data
corruption issues would get a higher priority or that I would, at the very
least, get someone to confirm such a bug.

We are now impeded by this after having moved all our logging from JSON to
Avro to avoid the schema related problems we have been running into with
the JSON reader (null interpreted like double and failing when a string
eventually comes along) .

- Stefan


On Wed, Nov 11, 2015 at 10:14 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> Hi,
>
> Can someone please verify that this is in fact a bug so I can rule out our
> own mistakes?
>
> We have recently moved all our logging to Avro to compensate for schema
> differences in JSON that were causing various problems and our latest
> release is now impeded with this.
> Alternatively can someone please point me in the right direction if I was
> to try to fix this myself.
>
> Regards,
>   -Stefán
>
> On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <st...@activitystream.com>
> wrote:
>
>> Thank you Kamesh.
>>
>> I have created https://issues.apache.org/jira/browse/DRILL-4056 with the
>> description.
>> I will send you a confidential test file to your private email.
>>
>> Regards,
>>  -Stefan
>>
>> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com> wrote:
>>
>>> Hi Stefán,
>>>  Could you please raise a Jira with sample schema and sample input to
>>> reproduce it. I will look into this.
>>>
>>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <
>>> stefan@activitystream.com>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > I have an Avro file that support the following data/schema:
>>> >
>>> > {"field":"some", "classification":{"variant":"Gæst"}}
>>> >
>>> > When I select 10 rows from this file I get:
>>> >
>>> > +---------------------+
>>> > |       EXPR$0        |
>>> > +---------------------+
>>> > | Gæst                |
>>> > | Voksen              |
>>> > | Voksen              |
>>> > | Invitation KIF KBH  |
>>> > | Invitation KIF KBH  |
>>> > | Ordinarie pris KBH  |
>>> > | Ordinarie pris KBH  |
>>> > | Biljetter 200 krBH  |
>>> > | Biljetter 200 krBH  |
>>> > | Biljetter 200 krBH  |
>>> > +---------------------+
>>> >
>>> > The bug is that the field values are incorrectly de-serialized and the
>>> > value from the previous row is retained if the subsequent row is
>>> shorter.
>>> >
>>> > The sql query:
>>> >
>>> > "select s.classification.variant variant from dfs.<some> as s limit
>>> 10;"
>>> >
>>> >
>>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
>>> > previous row had the value "Invitation KIF KBH".
>>> >
>>> > Regards,
>>> >   -Stefán
>>> >
>>>
>>>
>>>
>>> --
>>> Kamesh.
>>>
>>
>>
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Stefán Baxter <st...@activitystream.com>.
Hi,

Can someone please verify that this is in fact a bug so I can rule out our
own mistakes?

We have recently moved all our logging to Avro to compensate for schema
differences in JSON that were causing various problems and our latest
release is now impeded with this.
Alternatively can someone please point me in the right direction if I was
to try to fix this myself.

Regards,
  -Stefán

On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> Thank you Kamesh.
>
> I have created https://issues.apache.org/jira/browse/DRILL-4056 with the
> description.
> I will send you a confidential test file to your private email.
>
> Regards,
>  -Stefan
>
> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com> wrote:
>
>> Hi Stefán,
>>  Could you please raise a Jira with sample schema and sample input to
>> reproduce it. I will look into this.
>>
>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <stefan@activitystream.com
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > I have an Avro file that support the following data/schema:
>> >
>> > {"field":"some", "classification":{"variant":"Gæst"}}
>> >
>> > When I select 10 rows from this file I get:
>> >
>> > +---------------------+
>> > |       EXPR$0        |
>> > +---------------------+
>> > | Gæst                |
>> > | Voksen              |
>> > | Voksen              |
>> > | Invitation KIF KBH  |
>> > | Invitation KIF KBH  |
>> > | Ordinarie pris KBH  |
>> > | Ordinarie pris KBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > +---------------------+
>> >
>> > The bug is that the field values are incorrectly de-serialized and the
>> > value from the previous row is retained if the subsequent row is
>> shorter.
>> >
>> > The sql query:
>> >
>> > "select s.classification.variant variant from dfs.<some> as s limit 10;"
>> >
>> >
>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
>> > previous row had the value "Invitation KIF KBH".
>> >
>> > Regards,
>> >   -Stefán
>> >
>>
>>
>>
>> --
>> Kamesh.
>>
>
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Stefán Baxter <st...@activitystream.com>.
Thank you Kamesh.

I have created https://issues.apache.org/jira/browse/DRILL-4056 with the
description.
I will send you a confidential test file to your private email.

Regards,
 -Stefan

On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <ka...@gmail.com> wrote:

> Hi Stefán,
>  Could you please raise a Jira with sample schema and sample input to
> reproduce it. I will look into this.
>
> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <st...@activitystream.com>
> wrote:
>
> > Hi,
> >
> > I have an Avro file that support the following data/schema:
> >
> > {"field":"some", "classification":{"variant":"Gæst"}}
> >
> > When I select 10 rows from this file I get:
> >
> > +---------------------+
> > |       EXPR$0        |
> > +---------------------+
> > | Gæst                |
> > | Voksen              |
> > | Voksen              |
> > | Invitation KIF KBH  |
> > | Invitation KIF KBH  |
> > | Ordinarie pris KBH  |
> > | Ordinarie pris KBH  |
> > | Biljetter 200 krBH  |
> > | Biljetter 200 krBH  |
> > | Biljetter 200 krBH  |
> > +---------------------+
> >
> > The bug is that the field values are incorrectly de-serialized and the
> > value from the previous row is retained if the subsequent row is shorter.
> >
> > The sql query:
> >
> > "select s.classification.variant variant from dfs.<some> as s limit 10;"
> >
> >
> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
> > previous row had the value "Invitation KIF KBH".
> >
> > Regards,
> >   -Stefán
> >
>
>
>
> --
> Kamesh.
>

Re: Avro deserialization bug - 1.3-SNAPSHOT

Posted by Kamesh <ka...@gmail.com>.
Hi Stefán,
 Could you please raise a Jira with sample schema and sample input to
reproduce it. I will look into this.

On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> Hi,
>
> I have an Avro file that support the following data/schema:
>
> {"field":"some", "classification":{"variant":"Gæst"}}
>
> When I select 10 rows from this file I get:
>
> +---------------------+
> |       EXPR$0        |
> +---------------------+
> | Gæst                |
> | Voksen              |
> | Voksen              |
> | Invitation KIF KBH  |
> | Invitation KIF KBH  |
> | Ordinarie pris KBH  |
> | Ordinarie pris KBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> +---------------------+
>
> The bug is that the field values are incorrectly de-serialized and the
> value from the previous row is retained if the subsequent row is shorter.
>
> The sql query:
>
> "select s.classification.variant variant from dfs.<some> as s limit 10;"
>
>
> That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
> previous row had the value "Invitation KIF KBH".
>
> Regards,
>   -Stefán
>



-- 
Kamesh.