You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Jeff Hammerbacher (JIRA)" <ji...@apache.org> on 2009/11/19 11:51:39 UTC

[jira] Created: (AVRO-219) Break testio.py into testschema.py, testio.py, and testdatafile.py

Break testio.py into testschema.py, testio.py, and testdatafile.py
------------------------------------------------------------------

                 Key: AVRO-219
                 URL: https://issues.apache.org/jira/browse/AVRO-219
             Project: Avro
          Issue Type: Improvement
          Components: python
            Reporter: Jeff Hammerbacher


Currently, the unit tests for schema.py datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796059#action_12796059 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

I should have noted: the new patch contains an implementation of the file object container format specified in AVRO-160.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793678#action_12793678 ] 

Philip Zeyliger commented on AVRO-219:
--------------------------------------

Are you doing the "new" object container (AVRO-160) or the "older" one?

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-219:
-----------------------------------

    Attachment: AVRO-219-schema-io-and-datafile.patch

Okay, I have the whole IO path working in a new implementation now: schema parsing and printing, binary datum serialization and deserialization, and the readng and writing the new (AVRO-160) file object container.

Time for a code review? I'm going to start on the IPC path tomorrow.

Also, given that Sharad has been absent of late, I'd like to propose replacing the current implementation with this implementation once it passes reviews.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-219:
-----------------------------------

    Status: Patch Available  (was: Open)

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-219:
------------------------------

       Resolution: Fixed
    Fix Version/s: 1.3.0
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Jeff!

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>             Fix For: 1.3.0
>
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789968#action_12789968 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

Basically rewriting io path of Python implementation for this issue. See http://github.com/hammer/avro for progress.

> Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py
> ------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795753#action_12795753 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

Also holding up this patch: AVRO-160 changed the file object container format between the time I posted the patch and now. I'll find some time Sunday to implement the new file object container format once I've understood its specification.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795166#action_12795166 ] 

Doug Cutting commented on AVRO-219:
-----------------------------------

A few random comments from someone who does not program in Python:
 - please package this as a single patch file that replaces the existing implementation.
 - some of the TODO's seem critical, like skip_int.
 - those big if .. elif expressions in read_data, write_data and skip_data look like performance pits.  might something like http://simonwillison.net/2004/May/7/switch/ be better? or should schema itself have read/write/skip methods?
 - validate is overkill for picking the union branch.  a union can only have one branch of each unnamed type, and named types can be distinguished by name.  in python, the only two types that are not distinct are records and maps, since both are represented by python dicts.  so any dict without a name might be considered a map and those with are records whose names can be checked against the schema in the union.


> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793947#action_12793947 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

I should also mention: if you'd like to see how this implementation proceeded, please check out http://github.com/hammer/avro/commits/trunk.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-219:
-----------------------------------

    Attachment: AVRO-219.patch

Okay, here's a patch that replaces the current Python implementation with just the IO path that I've rewritten. I have not addressed the union validation concern, as I didn't completely grok the point. I'd like to address that issue in a separate JIRA after this patch gets committed, if possible.

Also, I have removed the interop tests for now. Once this patch is approved and committed, I'll clean up AVRO-264 and get that committed, then open a ticket to add the interop tests back in.

Thanks,
Jeff

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789956#action_12789956 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

Issues encountered in current schema.py:
* {"type": "string"} is not a valid schema


> Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py
> ------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794538#action_12794538 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

I should also mention: I've removed the dependency on odict.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795751#action_12795751 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

bq. please package this as a single patch file that replaces the existing implementation.

Sure, will address this on Sunday, when I return to the states.

bq. some of the TODO's seem critical, like skip_int.

skip_int and skip_long are copied from the old Python implementation. I believe they are broken, but this patch doesn't introduce the problem. I plan to add tests and sort out that issue soon, but can I address the TODOs in separate JIRAs? Blocking the commit of this patch for TODO scrubbing will mean more work outside of Apache's SVN.

bq. those big if .. elif expressions in read_data, write_data and skip_data look like performance pits.

The comments on that blog post point out that a bit if/(elif)+/else block is the standard way to approximate switch/case in Python. Simon's idiom is less popular in Python code I've seen. The previous implementation built a dict of function calls, similar to the blog post you point out, and I found that to be unnecessarily complex. My goal with the Python code is to be correct, concise, and easy to understand first, and fast second. Can we keep the current approach and benchmark it in AVRO-217?

bq. validate is overkill for picking the union branch.

Your suggestion sounds like a performance optimization to avoid calling validate() many times, but which would further obfuscate the function of the code. I don't think it's a good idea at this time, given the above stated aims of the Python implementation. If I've misunderstood your intent, please correct me.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-219:
-----------------------------------

    Attachment: AVRO-219.patch.schema_and_io

Okay, schema parsing as well as round-trip IO works. I will add object file container support by tomorrow night, at which point this will be ready to check in.

Any reviews would be helpful!

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-219:
-----------------------------------

    Summary: Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests  (was: Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py)

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219.patch.schema
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793804#action_12793804 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

New. My initial attempt last night failed a bit, so will head over to that ticket for some comments.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795869#action_12795869 ] 

Doug Cutting commented on AVRO-219:
-----------------------------------

> I believe they are broken, but this patch doesn't introduce the problem.

Okay, that's fine then.

> Can we keep the current approach and benchmark it in AVRO-217?

Yes, that's a good plan.

> Your suggestion sounds like a performance optimization to avoid calling validate() many times [ .. ]

It is in part performance, but also correctness.  A union can contain two records with different names but which are otherwise identical.  The current definition of validate does not handle this correctly, since it only validates field names and values and not the record name.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-219:
-----------------------------------

    Description: Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.  (was: Currently, the unit tests for schema.py datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.)
        Summary: Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py  (was: Break testio.py into testschema.py, testio.py, and testdatafile.py)

> Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py
> ------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-219) Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794049#action_12794049 ] 

Jeff Hammerbacher commented on AVRO-219:
----------------------------------------

Started on the IPC path. See AVRO-264.

> Rewrite Python implementation's IO path (schema.py, io.py, genericio.py, datafile.py) and associated tests
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219-schema-io-and-datafile.patch, AVRO-219.patch.schema, AVRO-219.patch.schema_and_io
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-219) Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-219:
-----------------------------------

    Attachment: AVRO-219.patch.schema

Okay, I've got most schemas parsing correctly in my new implementation. See the attached patch or http://github.com/hammer/avro.

I'd really appreciate a review of the direction for new_schema.py; I'm moving on tonight to io.py and genericio.py and should be able to work out any issues with the implementation through that process, but I'd appreciate an extra set of eyes.

One note: I didn't want to maintain state when serializing a schema object to a string or when comparing to another schema object, so I kept around private variables for the schema objects which could have children to keep track of whether or not a child's name was resolved from the names cache. It's a bit hacky but I can easily switch to the alternative method if people don't like it.

Also, there are a few things not implemented: I don't check for correct default values, I don't handle the error schema, and I don't actually parse arbitrary properties (though they should not call a parse failure). On the other hand, I handle a far wider variety of schemas correctly than the existing Python implementation, and the variety of schemas used to test my implementation is wider.

> Break testio.py into testschema.py, testio.py, testgenericio.py, and testdatafile.py
> ------------------------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>         Attachments: AVRO-219.patch.schema
>
>
> Currently, the unit tests for schema.py, genericio.py, and datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-219) Break testio.py into testschema.py, testio.py, and testdatafile.py

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher reassigned AVRO-219:
--------------------------------------

    Assignee: Jeff Hammerbacher

> Break testio.py into testschema.py, testio.py, and testdatafile.py
> ------------------------------------------------------------------
>
>                 Key: AVRO-219
>                 URL: https://issues.apache.org/jira/browse/AVRO-219
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Jeff Hammerbacher
>            Assignee: Jeff Hammerbacher
>
> Currently, the unit tests for schema.py datafile.py are grouped in with the unit tests for io.py in testio.py. We should break the tests into individual files so that we have better modularization of tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.