You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Timothee Maret (Jira)" <ji...@apache.org> on 2023/03/15 08:34:00 UTC

[jira] [Commented] (OAK-10116) Performance problem when importing nodes with many binary properties and remote blobstore

    [ https://issues.apache.org/jira/browse/OAK-10116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700549#comment-17700549 ] 

Timothee Maret commented on OAK-10116:
--------------------------------------

Assuming that checking for the blob presence is required, it may be possible to check by batches and/or check asynchronously.

> Performance problem when importing nodes with many binary properties and remote blobstore
> -----------------------------------------------------------------------------------------
>
>                 Key: OAK-10116
>                 URL: https://issues.apache.org/jira/browse/OAK-10116
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: blob-cloud, blob-plugins, jcr
>    Affects Versions: 1.48.0
>            Reporter: Joerg Hoh
>            Priority: Major
>
> We often import binaryless packages (using JR filevault) into our Oak instances, which are using a remote blobstore.
> We observe bad performance when we import nodes with binary properties. In this case stacktraces often look like this:
> {noformat}
> "Queue Processor for Subscriber agent publishSubscriber" #311 daemon prio=5 os_prio=0 cpu=298928.76ms elapsed=576.04s tid=0x0000563f968c6800 nid=0x1644 runnable  [0x00007f2a609e3000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(java.base@11.0.15/Native Method)
>         at java.net.SocketInputStream.socketRead(java.base@11.0.15/SocketInputStream.java:115)
>         at java.net.SocketInputStream.read(java.base@11.0.15/SocketInputStream.java:168)
>         at java.net.SocketInputStream.read(java.base@11.0.15/SocketInputStream.java:140)
>         at sun.security.ssl.SSLSocketInputRecord.read(java.base@11.0.15/SSLSocketInputRecord.java:478)
>         at sun.security.ssl.SSLSocketInputRecord.readHeader(java.base@11.0.15/SSLSocketInputRecord.java:472)
>         at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(java.base@11.0.15/SSLSocketInputRecord.java:70)
>         at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java.base@11.0.15/SSLSocketImpl.java:1328)
>         at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java.base@11.0.15/SSLSocketImpl.java:971)
>         at java.io.BufferedInputStream.fill(java.base@11.0.15/BufferedInputStream.java:252)
>         at java.io.BufferedInputStream.read1(java.base@11.0.15/BufferedInputStream.java:292)
>         at java.io.BufferedInputStream.read(java.base@11.0.15/BufferedInputStream.java:351)
>         - locked <0x00000007d98d0ca8> (a java.io.BufferedInputStream)
>         at sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.15/HttpClient.java:746)
>         at sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.15/HttpClient.java:689)
>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.15/HttpURLConnection.java:1615)
>         - locked <0x00000007d98cb480> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.15/HttpURLConnection.java:1520)
>         - locked <0x00000007d98cb480> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
>         at java.net.HttpURLConnection.getResponseCode(java.base@11.0.15/HttpURLConnection.java:527)
>         at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(java.base@11.0.15/HttpsURLConnectionImpl.java:334)
>         at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:115)
>         at com.microsoft.azure.storage.blob.CloudBlob.downloadAttributes(CloudBlob.java:1414)
>         at com.microsoft.azure.storage.blob.CloudBlob.downloadAttributes(CloudBlob.java:1381)
>         at org.apache.jackrabbit.oak.blob.cloud.azure.blobstorage.AzureBlobStoreBackend.getRecord(AzureBlobStoreBackend.java:408)
>         at org.apache.jackrabbit.oak.plugins.blob.AbstractSharedCachingDataStore.getRecordIfStored(AbstractSharedCachingDataStore.java:210)
>         at org.apache.jackrabbit.core.data.AbstractDataStore.getRecordFromReference(AbstractDataStore.java:72)
>         at org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore.getBlobId(DataStoreBlobStore.java:402)
>         at org.apache.jackrabbit.oak.segment.SegmentNodeStore.getBlob(SegmentNodeStore.java:257)
>         at org.apache.jackrabbit.oak.composite.CompositeNodeStore.getBlob(CompositeNodeStore.java:202)
>         at org.apache.jackrabbit.oak.core.MutableRoot.getBlob(MutableRoot.java:342)
>         at org.apache.jackrabbit.oak.plugins.value.jcr.ValueFactoryImpl.createValue(ValueFactoryImpl.java:111)
>         at org.apache.jackrabbit.vault.util.DocViewProperty.apply(DocViewProperty.java:413)
>         at org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.createNode(DocViewSAXImporter.java:1131)
>         at org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.addNode(DocViewSAXImporter.java:891)
>         at org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.startElement(DocViewSAXImporter.java:681)
>         at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(java.xml@11.0.15/AbstractSAXParser.java:510)
>         at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(java.xml@11.0.15/XMLNSDocumentScannerImpl.java:374)
>         at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(java.xml@11.0.15/XMLDocumentFragmentScannerImpl.java:2710)
>         at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(java.xml@11.0.15/XMLDocumentScannerImpl.java:605)
>         at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(java.xml@11.0.15/XMLNSDocumentScannerImpl.java:112)
>         at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(java.xml@11.0.15/XMLDocumentFragmentScannerImpl.java:534)
>         at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(java.xml@11.0.15/XML11Configuration.java:888)
>         at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(java.xml@11.0.15/XML11Configuration.java:824)
>         at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(java.xml@11.0.15/XMLParser.java:141)
>         at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(java.xml@11.0.15/AbstractSAXParser.java:1216)
>         at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(java.xml@11.0.15/SAXParserImpl.java:635)
>         at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(java.xml@11.0.15/SAXParserImpl.java:324)
>         at org.apache.jackrabbit.vault.fs.impl.io.GenericArtifactHandler.accept(GenericArtifactHandler.java:100)
>         at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:932)
>         at org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:799)
> {noformat}
> In this context we can ensure that all binaries are available on the remote blobstore, so a call to the blobstore would not required, at least not for validating its presence; all other information could/should be part of the filevault package.
> In my opinion the ValueFactory should be able to create a binary property without reaching out to the blobstore to avoid the network latency. This would speed up the import process dramatically; as in the context of this situation we can create approx 20 binary properties per second, while we can create thousands of non-binary properties in the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)