You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Noble Paul (JIRA)" <ji...@apache.org> on 2018/12/06 00:02:00 UTC

[jira] [Comment Edited] (SOLR-12885) BinaryResponseWriter (javabin format) should directly copy from Bytesref to output

    [ https://issues.apache.org/jira/browse/SOLR-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709641#comment-16709641 ] 

Noble Paul edited comment on SOLR-12885 at 12/6/18 12:01 AM:
-------------------------------------------------------------

some numbers

Test docs
 10 docs with 6 String field

Test 1: cache is disabled .
 * Every query reads from stored fields
 * Each field creates an appropriate object of string/Utf8CharSequence depending on what is set

Test 2: cache is enabled .
 * Every query reads from document cache
 * Only serialization performance is tested

{code:java}
Test:1
Total queries:  10K
NO docs from document cache
------------------------------------------------------------------------
Using UTF8 : false
time taken : 10890
Total Strings created from stored fields : 600000
Total UTF8 created from storedfields : 0
Total java Strings serialized : 780000
Total UTF8 serialized  : 0

----------------------------------------------------------------------------

Using UTF8 : true
Using DOC cache : false
time taken : 6550
Total Strings created from stored fields : 0
Total UTF8 created from storedfields : 600000
Total java Strings serialized : 180000
Total UTF8 serialized  : 600000

===============================================
Test:2
Total queries:  10K
ALL docs from document cache
===============================================
Using UTF8 : false
Using DOC cache : true
time taken : 10335
Total Strings created from stored fields : 0
Total UTF8 created from storedfields : 0
Total java Strings serialized : 180000
Total UTF8 serialized  : 600000

------------------------------------------------------------------------
Using UTF8 : true
Using DOC cache : true
time taken : 5551
Total Strings created from stored fields : 0
Total UTF8 created from storedfields : 0
Total java Strings serialized : 180000
Total UTF8 serialized  : 600000
------------------------------------------------------------------------
{code}


was (Author: noble.paul):
some numbers

Test docs
10 docs with 6 String field

Test 1: cache is disabled . 
* Every query reads from stored fields
* Each field creates an appropriate object of string/Utf8CharSequence depending on what is set

Test 2: cache is enabled . 
* Every query reads from document cache
* Only serialization performance is tested

{code}
Test:1
Total queries:  10K
NO docs from document cache
------------------------------------------------------------------------
use : String
time taken : 12932
Total Strings created : 600000
Total UTF8 created : 0
JavaBinCodec.utf16_to_utf8 :  780000
FastOutputStream.writeUtf8CharSeqs : 0

--------------------------------------------------------------------------------
use : Utf8CharSequence
time taken : 7801
Total Strings created: 0
Total UTF8 created: 600000
JavaBinCodec.utf16_to_utf8 :  180000
FastOutputStream.writeUtf8CharSeqs : 600000

===============================================
Test:2
Total queries:  10K
ALL docs from document cache
===============================================
use : String
time taken : 10362
Total Strings created : 600000
Total UTF8 created : 0
JavaBinCodec.utf16_to_utf8 : 780000
FastOutputStream.writeUtf8CharSeqs : 0
------------------------------------------------------------------------
use: Utf8CharSquence
time taken : 6104
Total Strings created : 0
Total UTF8 created : 600000
JavaBinCodec.utf16_to_utf8 : 180000
FastOutputStream.writeUtf8CharSeqs : 600000
{code}



> BinaryResponseWriter (javabin format) should directly copy from Bytesref to output
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-12885
>                 URL: https://issues.apache.org/jira/browse/SOLR-12885
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>            Priority: Major
>         Attachments: SOLR-12885.patch, SOLR-12885.patch, SOLR-12885.patch, SOLR-12885.patch
>
>
> The format format in which bytes are stored in {{BytesRef}} and the javabin string format are both the same. We don't need to convert the string/text fields from {{BytesRef}} to String and back to UTF8 
> {{Now a String/Text field is read and written out as follows}}
> {{luceneindex(UTF8 bytes) --> UTF16 (char[]) --> new String() a copy of UTF16 char[] -->  UTF8bytes(javabin format)}}
> This does not add a new type to javabin. It's encoded as String in the serialized data. When it is deserialized, you get a String back



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org