You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by SB M <ma...@gmail.com> on 2019/07/26 11:51:41 UTC

Reg: Avrojob schema validation option.

Hi All,

 Problem: I need a option to set name validation for schema parsing, when
setting with avrojob and avromultipleinputs.

Is there any way to set schema name validation to false currently, when  am
ho through source code am not able to find any options like that.

Please give a some solution.

Regards,
Sree.

Re: Reg: Avrojob schema validation option.

Posted by SB M <ma...@gmail.com>.
Hi,
Just out of curiosity, does your MapReduce job run correctly if you
manually just replace the "." with a "_" in your schema

Yes it worked.

Regards,
SBM

On Tue, 30 Jul, 2019, 21:22 Ryan Skraba, <ry...@skraba.com> wrote:

> OK, I learned something new -- I have never seen the
> setValidation(false) before!  It looks like it was added for Avro 1.5
> (https://issues.apache.org/jira/browse/AVRO-838) only to be compatible
> with files generated by Avro 1.4 implementations that (wrongly)
> permitted invalid names.
>
> It looks like it isn't possible with the existing AvroJob
> implementation in that case, and the static methods that use
> Schema.parser() are widely baked into the Avro MapReduce classes, so I
> can't see an easy workaround.
>
> You might raise a JIRA or discuss on the dev@ mailing list for a new
> feature, but I suspect that the best route would be to try and move
> your schemas to names that meet the specification!
>
> Just out of curiosity, does your MapReduce job run correctly if you
> manually just replace the "." with a "_" in your schema?
>
> All my best, Ryan
>
>
> On Tue, Jul 30, 2019 at 4:21 PM SB M <ma...@gmail.com> wrote:
> >
> > Hi ,
> >
> > What I mean when using a Schema.parse(), we can set the validation of
> names to false, using setValidation method.
> >
> > But using avrojob there is no option to set this validation to false.
> >
> > I want this validation option to make the parser validation to false.so
> that I can use it my code.
> >
> >
> > What am trying to achieve is I have a avro schema with sub structure
> with name has operators separated by dot(.), which is not a valid one when
> schema gets parsing. It will throw error.
> >
> > But there is an option to set this name validation to false while
> parsing by using new schema.Parser().setValidation(false);
> >
> > But with the AvroJob no option to set validation. I needed this feature.
> >
> > Thanks,
> > Sree
> >
> >
> > On Tue, 30 Jul, 2019, 14:11 Ryan Skraba, <ry...@skraba.com> wrote:
> >>
> >> Hello!  I'm not sure I understand your question.  Some names are
> >> *required* with a specific format in the Avro specification
> >> (http://avro.apache.org/docs/1.8.2/spec.html#names)
> >>
> >> What are you looking to accomplish?  I can think of two scenarios that
> >> we've seen in the past: (1) anonymous records where the name has no
> >> interest, and (2) mapping a structure that supports arbitrary UTF-8
> >> names (like a database table) to a record with the same field names.
> >> Neither of those are supported in the Avro specification.
> >>
> >> For the first case (where we don't care about the record name), we
> >> just autogenerated a "safe" but unused record name.
> >>
> >> For the second case, we used a custom annotation on the field
> >> (something like "display.name") to contain the original value and
> >> generated a "safe" field name.
> >>
> >> In both cases, being safe means that it meets the Avro spec
> >> ([A-Za-z_][A-Za-z0-9_]*) and avoids collisions with other generated
> >> names.
> >>
> >> I hope this helps!  Ryan
> >>
> >> On Fri, Jul 26, 2019 at 1:52 PM SB M <ma...@gmail.com> wrote:
> >> >
> >> > Hi All,
> >> >
> >> >  Problem: I need a option to set name validation for schema parsing,
> when setting with avrojob and avromultipleinputs.
> >> >
> >> > Is there any way to set schema name validation to false currently,
> when  am ho through source code am not able to find any options like that.
> >> >
> >> > Please give a some solution.
> >> >
> >> > Regards,
> >> > Sree.
> >> >
>

Re: Reg: Avrojob schema validation option.

Posted by Ryan Skraba <ry...@skraba.com>.
OK, I learned something new -- I have never seen the
setValidation(false) before!  It looks like it was added for Avro 1.5
(https://issues.apache.org/jira/browse/AVRO-838) only to be compatible
with files generated by Avro 1.4 implementations that (wrongly)
permitted invalid names.

It looks like it isn't possible with the existing AvroJob
implementation in that case, and the static methods that use
Schema.parser() are widely baked into the Avro MapReduce classes, so I
can't see an easy workaround.

You might raise a JIRA or discuss on the dev@ mailing list for a new
feature, but I suspect that the best route would be to try and move
your schemas to names that meet the specification!

Just out of curiosity, does your MapReduce job run correctly if you
manually just replace the "." with a "_" in your schema?

All my best, Ryan


On Tue, Jul 30, 2019 at 4:21 PM SB M <ma...@gmail.com> wrote:
>
> Hi ,
>
> What I mean when using a Schema.parse(), we can set the validation of names to false, using setValidation method.
>
> But using avrojob there is no option to set this validation to false.
>
> I want this validation option to make the parser validation to false.so that I can use it my code.
>
>
> What am trying to achieve is I have a avro schema with sub structure with name has operators separated by dot(.), which is not a valid one when schema gets parsing. It will throw error.
>
> But there is an option to set this name validation to false while parsing by using new schema.Parser().setValidation(false);
>
> But with the AvroJob no option to set validation. I needed this feature.
>
> Thanks,
> Sree
>
>
> On Tue, 30 Jul, 2019, 14:11 Ryan Skraba, <ry...@skraba.com> wrote:
>>
>> Hello!  I'm not sure I understand your question.  Some names are
>> *required* with a specific format in the Avro specification
>> (http://avro.apache.org/docs/1.8.2/spec.html#names)
>>
>> What are you looking to accomplish?  I can think of two scenarios that
>> we've seen in the past: (1) anonymous records where the name has no
>> interest, and (2) mapping a structure that supports arbitrary UTF-8
>> names (like a database table) to a record with the same field names.
>> Neither of those are supported in the Avro specification.
>>
>> For the first case (where we don't care about the record name), we
>> just autogenerated a "safe" but unused record name.
>>
>> For the second case, we used a custom annotation on the field
>> (something like "display.name") to contain the original value and
>> generated a "safe" field name.
>>
>> In both cases, being safe means that it meets the Avro spec
>> ([A-Za-z_][A-Za-z0-9_]*) and avoids collisions with other generated
>> names.
>>
>> I hope this helps!  Ryan
>>
>> On Fri, Jul 26, 2019 at 1:52 PM SB M <ma...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> >  Problem: I need a option to set name validation for schema parsing, when setting with avrojob and avromultipleinputs.
>> >
>> > Is there any way to set schema name validation to false currently, when  am ho through source code am not able to find any options like that.
>> >
>> > Please give a some solution.
>> >
>> > Regards,
>> > Sree.
>> >

Re: Reg: Avrojob schema validation option.

Posted by SB M <ma...@gmail.com>.
Hi ,

What I mean when using a Schema.parse(), we can set the validation of names
to false, using setValidation method.

But using avrojob there is no option to set this validation to false.

I want this validation option to make the parser validation to false.so
that I can use it my code.


What am trying to achieve is I have a avro schema with sub structure with
name has operators separated by dot(.), which is not a valid one when
schema gets parsing. It will throw error.

But there is an option to set this name validation to false while parsing
by using new schema.Parser().setValidation(false);

But with the AvroJob no option to set validation. I needed this feature.

Thanks,
Sree


On Tue, 30 Jul, 2019, 14:11 Ryan Skraba, <ry...@skraba.com> wrote:

> Hello!  I'm not sure I understand your question.  Some names are
> *required* with a specific format in the Avro specification
> (http://avro.apache.org/docs/1.8.2/spec.html#names)
>
> What are you looking to accomplish?  I can think of two scenarios that
> we've seen in the past: (1) anonymous records where the name has no
> interest, and (2) mapping a structure that supports arbitrary UTF-8
> names (like a database table) to a record with the same field names.
> Neither of those are supported in the Avro specification.
>
> For the first case (where we don't care about the record name), we
> just autogenerated a "safe" but unused record name.
>
> For the second case, we used a custom annotation on the field
> (something like "display.name") to contain the original value and
> generated a "safe" field name.
>
> In both cases, being safe means that it meets the Avro spec
> ([A-Za-z_][A-Za-z0-9_]*) and avoids collisions with other generated
> names.
>
> I hope this helps!  Ryan
>
> On Fri, Jul 26, 2019 at 1:52 PM SB M <ma...@gmail.com> wrote:
> >
> > Hi All,
> >
> >  Problem: I need a option to set name validation for schema parsing,
> when setting with avrojob and avromultipleinputs.
> >
> > Is there any way to set schema name validation to false currently, when
> am ho through source code am not able to find any options like that.
> >
> > Please give a some solution.
> >
> > Regards,
> > Sree.
> >
>

Re: Reg: Avrojob schema validation option.

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  I'm not sure I understand your question.  Some names are
*required* with a specific format in the Avro specification
(http://avro.apache.org/docs/1.8.2/spec.html#names)

What are you looking to accomplish?  I can think of two scenarios that
we've seen in the past: (1) anonymous records where the name has no
interest, and (2) mapping a structure that supports arbitrary UTF-8
names (like a database table) to a record with the same field names.
Neither of those are supported in the Avro specification.

For the first case (where we don't care about the record name), we
just autogenerated a "safe" but unused record name.

For the second case, we used a custom annotation on the field
(something like "display.name") to contain the original value and
generated a "safe" field name.

In both cases, being safe means that it meets the Avro spec
([A-Za-z_][A-Za-z0-9_]*) and avoids collisions with other generated
names.

I hope this helps!  Ryan

On Fri, Jul 26, 2019 at 1:52 PM SB M <ma...@gmail.com> wrote:
>
> Hi All,
>
>  Problem: I need a option to set name validation for schema parsing, when setting with avrojob and avromultipleinputs.
>
> Is there any way to set schema name validation to false currently, when  am ho through source code am not able to find any options like that.
>
> Please give a some solution.
>
> Regards,
> Sree.
>