You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by "Farkas, Zoltan" <Zo...@pimco.com> on 2016/04/04 16:58:43 UTC

RE: String Pooling on reader side

I agree this is quite useful…

You should be able to use logicalTypes for this purpose, by implementing your own and use it in your idl like:

@logicalType("internedString ") string  myStringField;

It might be even possible to create a logical type that would work with any other type… @logicalType("interned") to deduplicate any types

--Z

From: Bernardo Bennett [mailto:bernardo.bennett@gmail.com]
Sent: Thursday, March 31, 2016 12:41 PM
To: user@avro.apache.org
Subject: String Pooling on reader side

Are there plans to introduce such feature? Depending on the nature of the data, memory savings can be quite substantial.

So far I've experimented modifying the java generated IndexedRecord.put() methods to perform lookups on concurrent hash maps in case field type is String. The overhead seems insignificant compared to savings on GC times and disk spills (Spark) for applications which read and cache avros in memory.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute, alter or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmissions cannot be guaranteed to be secure or without error as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise during or as a result of e-mail transmission. If verification is required, please request a hard-copy version. This message is provided for information purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments in any jurisdiction.  Securities are offered in the U.S. through PIMCO Investments LLC, distributor and a company of PIMCO LLC.