You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/13 22:57:38 UTC

[GitHub] [lucene] uschindler opened a new pull request, #888: LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests)

uschindler opened a new pull request, #888:
URL: https://github.com/apache/lucene/pull/888

   see https://issues.apache.org/jira/browse/LUCENE-10572


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1792528482

   Hi @mikemccand,
   I reset the branch to the initial commit (without BytesRefHash & Co. changes ). Then I merged and pushed.
   I will now try to redo the changes.
   
   In fact, on x86 machines it makes no sense to benchmark it, as the LE byte order is already native :-) This PR only helps with architectures like s390x that have big endian, as the internals of BytesRefHash would never make it into a file format so they can encode their "private data" in native endianness. We still randomize the endianness on testing, so we make sure both variants work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1927340132

   I forgot about this PR, we should really apply it. #13076 is another candidate that could make use of this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler merged PR #888:
URL: https://github.com/apache/lucene/pull/888


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1792606649

   Thanks @uschindler!  Removing vShort and switching to LE (or native -- I didn't understand the problem with that -- this is never (directly) serialized to a Lucene index) short seems good?  I guess we lose a bit of RAM efficiency, sometimes taking two bytes instead of one.  But we get faster CPU decode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1790470336

   Oooh I missed this @uschindler -- it looks like a nice possible opto for the costly `BytesRefHash` methods, and it looks like (on the issue) you and @rmuir came to agreement on approach (this PR).
   
   I can benchmark this, but could you maybe modernize it to resolve the conflicts?  Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1792595250

   > @uschindler <https://github.com/uschindler> pushed 0 commits.
   
   Huh, how do you do that?
   
   Mike McCandless
   
   http://blog.mikemccandless.com
   
   
   On Fri, Nov 3, 2023 at 10:42 AM Uwe Schindler ***@***.***>
   wrote:
   
   > @mikemccand <https://github.com/mikemccand>: If you want to see the
   > changes I reverted, see the above comparison:
   > https://github.com/apache/lucene/compare/36de2bb7fa7a0587a102cf5c4d35ac8f94976bbd..c1b626c0636821f4d7c085895359489e7dfa330f
   >
   > Those changes need to be re-applied to the repo in correct files (not sure
   > where this code now lives, looks like BytesRefBlockPool, but no idea, sorry)
   >
   > I think I know after looking into those changes what the problem was.
   > Internally BytesRefHash uses BIG ENDIAN, because some parts in the byte
   > array are "UTF-8 like" encoded (if highest bit is set another byte
   > follows). As this is stupid to do and requires only a few bytes more
   > storage, I removed that encoding to always use shorts instead of "byte or
   > BE short". When the encoding no longer matters and must not be "UTF-8
   > encoding like", it can use native order. But for safety you could also use
   > LE encoding to make use of actual CPUs (ARM is also LE now).
   >
   > So we have 2 posisbilities:
   >
   >    - Change the internal encoding of bytesrefhash and remove the Big
   >    Endian UTF-8 like encoding (or call it vShort) and switch to Little Endian
   >    shorts
   >    - Use native encoding to also help CPUs like s390 and use native
   >    encoding (which also works). This PR supports this, but it is questionable
   >    for the reasons Robert said.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/lucene/pull/888#issuecomment-1792570482>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAGCOXAUIXXARYWAF4PRGQLYCT7GXAVCNFSM5V4VYZVKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZGI2TOMBUHAZA>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #888: LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1126569898

   Anybody may play and commit ideas to this PR. @rmuir @mikemccand 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1790521486

   Ohhhh, I forgot about this PR. When looking at the conflicts it looks like I need to redo at least the BytesRefHash/Pool code.
   
   We can use native order at all places where it is only used in memory and not persisted to disk (BytesRefHash) and where it does not matter (LZ4).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1792537252

   @mikemccand,
   I checked in main branch, it no longer uses any varhandles in BytesRefHash and ByteBlockPool. No idea where the code moved to.
   
   It now uses BytesRefBlockPool, but this one uses BIG ENDIAN byte order (for whatever reason). As I no longer know whcih of those ByteFoobar classes in Util are used internally and not serialized to disk and which ones are serialized to disk I won't change anything for now.
   
   So I restored the PR into the "known state" (it adds native varhandles) and changes LZ4 compression to use the native order (which is documented by LZ4 to not matter). So this one only improves LZ4.
   
   I have no time to look into the changes in BytesRefHash, so I give it to you to figure out where it is "ok" to change from fixed BE or LE order to native order, but care must be taken that those byte arrays are never persisted/serialized onto index file formats,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #888: LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests)

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1126578739

   I removed the vInt-like encoding in ByteBlockPool and BytesRefHash. After that I was able to switch to native shorts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1792570482

   > @mikemccand: If you want to see the changes I reverted, see the above comparison: https://github.com/apache/lucene/compare/36de2bb7fa7a0587a102cf5c4d35ac8f94976bbd..c1b626c0636821f4d7c085895359489e7dfa330f
   > 
   > Those changes need to be re-applied to the repo in correct files (not sure where this code now lives, looks like BytesRefBlockPool, but no idea, sorry)
   
   I think I know after looking into those changes what the problem was. Internally BytesRefHash uses BIG ENDIAN, because some parts in the byte array are "UTF-8 like" encoded (if highest bit is set another byte follows). As this is stupid to do and requires only a few bytes more storage, I removed that encoding to always use shorts instead of "byte or BE short". When the encoding no longer matters and must not be "UTF-8 encoding like", it can use native order. But for safety you could also use LE encoding to make use of actual CPUs (ARM is also LE now).
   
   So we have 2 posisbilities:
   - Change the internal encoding of bytesrefhash and remove the Big Endian UTF-8 like encoding (or call it vShort) and switch to Little Endian shorts
   - Use native encoding to also help CPUs like s390 and use native encoding (which also works). This PR supports this, but it is questionable for the reasons Robert said.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1792545702

   @mikemccand: If you want to see the changes I reverted, see the above comparison: https://github.com/apache/lucene/compare/c1b626c0636821f4d7c085895359489e7dfa330f..36de2bb7fa7a0587a102cf5c4d35ac8f94976bbd


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org