You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Philip (flip) Kromer (JIRA)" <ji...@apache.org> on 2010/09/03 04:26:32 UTC
[jira] Created: (AVRO-654) Recursive #validate() for union'ed
schemas in Ruby cripples performance
Recursive #validate() for union'ed schemas in Ruby cripples performance
-----------------------------------------------------------------------
Key: AVRO-654
URL: https://issues.apache.org/jira/browse/AVRO-654
Project: Avro
Issue Type: Bug
Components: ruby
Affects Versions: 1.3.3
Reporter: Philip (flip) Kromer
The ruby DatumWriter calls #validate() on each #write(). In the case of a schema with many nested unions (cf. Cassandra's*), this requires a recursive depth-first search to determine which branch to take. In ruby, these operations are very expensive -- enough to limit write speeds to 2k/sec on a machine of moderate size.
For repeated writing of the same data structure, one idea would be to create a CompiledDatumWriter. This would walk through the validation and assemble an tree of the methods to apply to each schema element in turn:
[ [:write_long 'id'], [:write_bytes, 'name'], [:write_record, 'address', [:write_long, 'street']] ]
---
* http://github.com/infochimps/cassandra/blob/beta1_plus_patches/interface/avro/cassandra.avpr
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-654) Recursive #validate() for union'ed
schemas in Ruby cripples performance
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905980#action_12905980 ]
Doug Cutting commented on AVRO-654:
-----------------------------------
Note that full, recursive validation is not required for union dispatch.
http://avro.apache.org/docs/1.3.3/spec.html#Unions
So a typical implementation of a union writer might look something like:
{code}
writeUnion(datum, union) {
int index = -1;
for (int i = 0; index ==-1 && i < union.length; i++) {
case (union[i].type) {
INT :
if (datum is int) {
index = i;
break;
}
INT :
if (datum is long)
index = i;
break;
}
... other unnamed types ...
RECORD:
if (datum is record) && datum.name.equals(union[i].name) {
index = i;
break;
}
... other named types ...
}
writeInt(index);
write(datum, union[index]);
}
{code}
> Recursive #validate() for union'ed schemas in Ruby cripples performance
> -----------------------------------------------------------------------
>
> Key: AVRO-654
> URL: https://issues.apache.org/jira/browse/AVRO-654
> Project: Avro
> Issue Type: Bug
> Components: ruby
> Affects Versions: 1.3.3
> Reporter: Philip (flip) Kromer
>
> The ruby DatumWriter calls #validate() on each #write(). In the case of a schema with many nested unions (cf. Cassandra's*), this requires a recursive depth-first search to determine which branch to take. In ruby, these operations are very expensive -- enough to limit write speeds to 2k/sec on a machine of moderate size.
> For repeated writing of the same data structure, one idea would be to create a CompiledDatumWriter. This would walk through the validation and assemble an tree of the methods to apply to each schema element in turn:
> [ [:write_long 'id'], [:write_bytes, 'name'], [:write_record, 'address', [:write_long, 'street']] ]
> ---
> * http://github.com/infochimps/cassandra/blob/beta1_plus_patches/interface/avro/cassandra.avpr
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.