You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Grant Rodgers (JIRA)" <ji...@apache.org> on 2010/05/27 19:57:38 UTC

[jira] Created: (AVRO-554) data files created by ruby DataWriter are extremely large

data files created by ruby DataWriter are extremely large
---------------------------------------------------------

                 Key: AVRO-554
                 URL: https://issues.apache.org/jira/browse/AVRO-554
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.3.0, 1.4.0
         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]

            Reporter: Grant Rodgers


Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting reassigned AVRO-554:
---------------------------------

    Assignee: Grant Rodgers

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Grant Rodgers
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Attachment: AVRO-554.patch

Need to rewind the buffer in addition to truncating (truncating does not reset the buffer position)

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-554:
-----------------------------------

    Fix Version/s: 1.3.3
                       (was: 1.4.0)

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Grant Rodgers
>             Fix For: 1.3.3
>
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873865#action_12873865 ] 

Jeff Hodges commented on AVRO-554:
----------------------------------

Committed as r949917

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Grant Rodgers
>             Fix For: 1.4.0
>
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hodges reassigned AVRO-554:
--------------------------------

    Assignee: Jeff Hodges  (was: Grant Rodgers)

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Jeff Hodges
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Attachment:     (was: AVRO-554.patch)

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875764#action_12875764 ] 

Jeff Hammerbacher commented on AVRO-554:
----------------------------------------

Hey,

Interesting problem here. It turns out that calling buffer.truncate(0) on StringIO buffer in Python will both clear the contents of the buffer and reset the position to 0. For a file buffer, however, you need to explicitly call buffer.reset(0) after buffer.truncate. I think Ruby's behavior is actually more reasonable. For those who'd like to follow along at home, I've opened a question on Quora to discover the source of this inconsistency in the Python buffer API: http://www.quora.com/Why-does-the-behavior-of-the-truncate-method-on-a-StringIO-object-in-Python-differ-from-the-truncate-method-on-a-file.

Later,
Jeff

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Grant Rodgers
>             Fix For: 1.3.3
>
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Attachment: AVRO-554-2.patch

Patch with test

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hodges reassigned AVRO-554:
--------------------------------

    Assignee: Grant Rodgers  (was: Jeff Hodges)

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Grant Rodgers
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hodges updated AVRO-554:
-----------------------------

           Status: Resolved  (was: Patch Available)
    Fix Version/s: 1.4.0
       Resolution: Fixed

Committed as r949917

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Grant Rodgers
>             Fix For: 1.4.0
>
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872308#action_12872308 ] 

Doug Cutting commented on AVRO-554:
-----------------------------------

The 10 and 100 entry files look fine.  Can you please post one that better illustrates the problem?  Thanks!

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: avro_comp.rb, data10.avr, data100.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872723#action_12872723 ] 

Jeff Hodges commented on AVRO-554:
----------------------------------

A test would be awesome for this. Could you write one up?

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hodges updated AVRO-554:
-----------------------------

    Component/s: ruby

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Jeff Hodges
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872318#action_12872318 ] 

Doug Cutting commented on AVRO-554:
-----------------------------------

Java fails to parse the 3000 entry file, complaining that it's corrupt, and 'od -v -c' shows big blocks of nulls.

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Attachment: patched-data3000.avr

data file created with patch

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Attachment: AVRO-554.patch

Oops, wrong license on first patch.

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Status: Patch Available  (was: Open)

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Attachment: data3000.avr.gz

Doing some more tests:

1000 records filesize: 7233
2000 records filesize: 14234
3000 records filesize: 13242729

Attached is the file with 3000 records (gzipped)

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: avro_comp.rb, data10.avr, data100.avr, data3000.avr.gz
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-554) data files created by ruby DataWriter are extremely large

Posted by "Grant Rodgers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Rodgers updated AVRO-554:
-------------------------------

    Attachment: avro_comp.rb
                data10.avr
                data100.avr

Attached a ruby script demonstrating the issue. Run it by executing 'ruby avro_comp.rb 10000' (after installing the avro rubygem)

Also attached two data files produced by the above script with 10 and 100 records.

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
>            Reporter: Grant Rodgers
>         Attachments: avro_comp.rb, data10.avr, data100.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.