You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Jeff Hodges (JIRA)" <ji...@apache.org> on 2010/03/15 17:16:27 UTC

[jira] Created: (AVRO-465) C implementation requires you to know a file's schema before reading

C implementation requires you to know a file's schema before reading
--------------------------------------------------------------------

                 Key: AVRO-465
                 URL: https://issues.apache.org/jira/browse/AVRO-465
             Project: Avro
          Issue Type: Bug
          Components: c
    Affects Versions: 1.3.0
            Reporter: Jeff Hodges


The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.

While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.

A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845437#action_12845437 ] 

Matt Massie commented on AVRO-465:
----------------------------------

I think the function in your patch could be a good addition to the C API regardless

{code}
avro_schema_t avro_schema_from_file_reader(avro_file_reader_t reader) {
  return reader->writers_schema;
}
{code}

should probably look like

{code}
avro_schema_t avro_schema_from_file_reader(avro_file_reader_t reader) {
  return reader? reader->writers_schema: NULL;
}
{code}

in case a NULL reader is passed to the function.  Also, please add the new method to {{avro.h}}.  

> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845411#action_12845411 ] 

Matt Massie commented on AVRO-465:
----------------------------------

I just noticed your public tweet...

bq. "@wilhelmbierbaum True. Fuck the Avro C API."

Are you referring to this issue specifically in your tweet or are there other issues you'd like addressed with the API?  I'd love to hear details about how the C API could be improved.  









> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Bruce Mitchener (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845706#action_12845706 ] 

Bruce Mitchener commented on AVRO-465:
--------------------------------------

I'd suggest this signature:

avro_schema_t avro_file_reader_get_schema(avro_file_reader_t reader)


> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845422#action_12845422 ] 

Matt Massie commented on AVRO-465:
----------------------------------

Did using NULL for the reader's schema work for you?

> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Massie resolved AVRO-465.
------------------------------

    Resolution: Fixed

> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845415#action_12845415 ] 

Jeff Hodges commented on AVRO-465:
----------------------------------

What I meant was: it should be written in OCaml, obviously.

Sorry. Lost my patience.



> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846003#action_12846003 ] 

Jeff Hodges commented on AVRO-465:
----------------------------------

Using NULL works great. Would definitely still be nice so I can pass around the schema object for various usages.

> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845405#action_12845405 ] 

Matt Massie commented on AVRO-465:
----------------------------------

Jeff-

I'd like to understand this problem a little better.  The C implementation shouldn't require you know the file schema ahead of time.

If you pass in NULL for the reader's schema, then the writer's schema will be used.  This is a documentation bug since I don't explicitly explain this anywhere.

Can you please try to read the data file with the reader's schema set to NULL?

Btw, the relevant code is in datum_read.c around line 303

{code}
if (readers_schema == NULL) {
     readers_schema = writers_schema;
} else if (!avro_schema_match(writers_schema, readers_schema)) {
     return EINVAL;
}
{code}


> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-465) C implementation requires you to know a file's schema before reading

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hodges updated AVRO-465:
-----------------------------

    Attachment: AVRO-465-schema_for_reader.patch

A patch that creates the avro_schema_from_file_reader(avro_file_reader_t reader) function for bulling the writers_schema out of a avro_file_reader_t. (a.k.a. option 1).

> C implementation requires you to know a file's schema before reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-465
>                 URL: https://issues.apache.org/jira/browse/AVRO-465
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.3.0
>            Reporter: Jeff Hodges
>         Attachments: AVRO-465-schema_for_reader.patch
>
>
> The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.
> While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.
> A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.