You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by KÖLL Claus <C....@TIROL.GV.AT> on 2007/06/14 11:05:05 UTC

Loading Node without loading Binary Data

Is it possible to get a node without loading the binary data ?
my search is very fast but i have a lot of huge pdf files and if i get a result set with about 20 files
i only want to get the name and some meta date from the node but not the binarys in the first step.

is there a solution ?

BR
claus


Re: Loading Node without loading Binary Data

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 6/14/07, KÖLL Claus <C....@tirol.gv.at> wrote:
> ok so the informations from christopher are unfortunately not right :-(
> i hope that my problem will be resolved with the JCR-926
>
> what do you think how long will it take to solve this problem ?

I have a working prototype and should have no trouble properly
integrating it with Jackrabbit in a month or so. It looks like we
could include the feature in Jackrabbit 1.4, but since it will mean
backwards-incompatible changes in Jackrabbit core we will need to
implement, test and document some heavy-duty content migration tools
before we can release the feature. Such tools will be very useful also
for number of other purposes, but it is not yet clear when we will
have them.

BR,

Jukka Zitting

Re: Loading Node without loading Binary Data

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
Hi Jukka,

ok so the informations from christopher are unfortunately not right :-(
i hope that my problem will be resolved with the JCR-926

what do you think how long will it take to solve this problem ?

BR,
claus
-----Ursprüngliche Nachricht-----
Von: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Gesendet: Donnerstag, 14. Juni 2007 14:20
An: dev@jackrabbit.apache.org
Betreff: Re: AW: Re: Loading Node without loading Binary Data

Hi,

On 6/14/07, Christoph Kiehl <ck...@sulu3000.de> wrote:
> If you didn't change the default your binaries (which are bigger than
> 4096 bytes) will be stored separate from the bundle. That means if you
> do nextNode() only the bundle without the binary should be loaded. As
> long as you do not access the binary property with
> node.getProperty(<name>) or node.getProperties() the binary will not be
> loaded. (Please correct me if I'm wrong)

Note that the default bundle persistence manager will do a SELECT from
the binval table when the node that contains a binary property is
loaded. The blob is then turned into a BLOBFileValue that
automatically spools the binary stream into a local temporary file.
:-(

This is one of the problems I'm trying to address with JCR-926.

BR,

Jukka Zitting

AW: AW: Re: Loading Node without loading Binary Data

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
thanks jukka,

the bundle pm is faster than the SimpledbPM but loads the binary on node load :-(
the binary on filesystem is no good option because the windows filesystem doesnt like millions of files ;-)
so i hope the datastore comes asap ..

BR,
claus

-----Ursprüngliche Nachricht-----
Von: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Gesendet: Donnerstag, 14. Juni 2007 14:59
An: dev@jackrabbit.apache.org
Betreff: Re: AW: Re: Loading Node without loading Binary Data

Hi,

On 6/14/07, Stefan Guggisberg <st...@gmail.com> wrote:
> note that jukka's stmt only applies to the bundle db pm.
> SimpleDbPersistenceManager (and derived classes) doesn't preload
> properties on parent node load.

Exactly. And even in bundle persistence the problem only affects cases
where you have blobs stored in the database instead of the local file
system.

BR,

Jukka Zitting

Re: AW: Re: Loading Node without loading Binary Data

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 6/14/07, Stefan Guggisberg <st...@gmail.com> wrote:
> note that jukka's stmt only applies to the bundle db pm.
> SimpleDbPersistenceManager (and derived classes) doesn't preload
> properties on parent node load.

Exactly. And even in bundle persistence the problem only affects cases
where you have blobs stored in the database instead of the local file
system.

BR,

Jukka Zitting

Re: AW: Re: Loading Node without loading Binary Data

Posted by Stefan Guggisberg <st...@gmail.com>.
On 6/14/07, Christoph Kiehl <ck...@sulu3000.de> wrote:
> Jukka Zitting wrote:
> > Hi,
> >
> > On 6/14/07, Christoph Kiehl <ck...@sulu3000.de> wrote:
> >> If you didn't change the default your binaries (which are bigger than
> >> 4096 bytes) will be stored separate from the bundle. That means if you
> >> do nextNode() only the bundle without the binary should be loaded. As
> >> long as you do not access the binary property with
> >> node.getProperty(<name>) or node.getProperties() the binary will not be
> >> loaded. (Please correct me if I'm wrong)
> >
> > Note that the default bundle persistence manager will do a SELECT from
> > the binval table when the node that contains a binary property is
> > loaded. The blob is then turned into a BLOBFileValue that
> > automatically spools the binary stream into a local temporary file.
> > :-(
>
> Ouch! This sounds like a serious problem. I didn't know that. What's the
> reason then to save binaries separate from the bundle?

note that jukka's stmt only applies to the bundle db pm.
SimpleDbPersistenceManager (and derived classes) doesn't preload
properties on parent node load.

cheers
stefan

>
> Cheers,
> Christoph
>
>

Re: AW: Re: Loading Node without loading Binary Data

Posted by Christoph Kiehl <ck...@sulu3000.de>.
Jukka Zitting wrote:
> Hi,
> 
> On 6/14/07, Christoph Kiehl <ck...@sulu3000.de> wrote:
>> If you didn't change the default your binaries (which are bigger than
>> 4096 bytes) will be stored separate from the bundle. That means if you
>> do nextNode() only the bundle without the binary should be loaded. As
>> long as you do not access the binary property with
>> node.getProperty(<name>) or node.getProperties() the binary will not be
>> loaded. (Please correct me if I'm wrong)
> 
> Note that the default bundle persistence manager will do a SELECT from
> the binval table when the node that contains a binary property is
> loaded. The blob is then turned into a BLOBFileValue that
> automatically spools the binary stream into a local temporary file.
> :-(

Ouch! This sounds like a serious problem. I didn't know that. What's the 
reason then to save binaries separate from the bundle?

Cheers,
Christoph


Re: AW: Re: Loading Node without loading Binary Data

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 6/14/07, Christoph Kiehl <ck...@sulu3000.de> wrote:
> If you didn't change the default your binaries (which are bigger than
> 4096 bytes) will be stored separate from the bundle. That means if you
> do nextNode() only the bundle without the binary should be loaded. As
> long as you do not access the binary property with
> node.getProperty(<name>) or node.getProperties() the binary will not be
> loaded. (Please correct me if I'm wrong)

Note that the default bundle persistence manager will do a SELECT from
the binval table when the node that contains a binary property is
loaded. The blob is then turned into a BLOBFileValue that
automatically spools the binary stream into a local temporary file.
:-(

This is one of the problems I'm trying to address with JCR-926.

BR,

Jukka Zitting

AW: Re: AW: Re: Loading Node without loading Binary Data

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi christoph,

thanks for the informations but it works not as you described.
the variable minblobsize will only be used on writestate to
decide to store the binval data into the binval table or not.
i have debuged the code and it works like ...

sample code ...

the result has only content nodes ..

Workspace workspace = session.getWorkspace();
QueryManager queryManager = workspace.getQueryManager();
Query query = queryManager.createQuery("//element(*, nt:base)[jcr:contains(., 'abc')] order by jcr:score()", Query.XPATH);
QueryResult queryResult = query.execute();
NodeIterator nodeIterator = queryResult.getNodes();
nodeIterator.nextNode();
	-> (some debug infos ...)	
	BundleDBPM.loadBundle()
		BundleBinding.readBundle()
		.. some code in the method
		
		        // properties
		        name = readIndexedQName(in);
		        while (name != null) {
		            PropertyId pId = new PropertyId(bundle.getId(), name);
		            NodePropBundle.PropertyEntry pState = readPropertyEntry(in, pId);
			.. some code in readProperty ...
		
		                case PropertyType.BINARY:			
                        	    if (blobStore instanceof ResourceBasedBLOBStore) {
		                	val = InternalValue.create(((ResourceBasedBLOBStore) blobStore).getResource(blobIds[i]));
	                            } else {
            	                    val = InternalValue.create(blobStore.get(blobIds[i]), false);
                        	    }

    			   BundleDBPM.DBBlobStore.get() <- in this method the binvalue will be loaded from db !
	
So every time i call nextNode() on nodeiterator the binvalue will be loaded ..
do i make something wrong... please help

BR,
claus


-----Ursprüngliche Nachricht-----
Von: news [mailto:news@sea.gmane.org] Im Auftrag von Christoph Kiehl
Gesendet: Donnerstag, 14. Juni 2007 12:05
An: dev@jackrabbit.apache.org
Betreff: Re: AW: Re: Loading Node without loading Binary Data

KÖLL Claus wrote:

> as you can see from my second mail i am using the bundle oracle pm.
> the node iterator which i get from a fulltext search has only nodes from type jcr:content. 
> my opinion only to load the parent (where the meta data are stored)  does not really work
> because if i make nextNode() on the iterator the binarys will be load and then it makes no more different to get the
> parent node of the jcr:content node.

If you didn't change the default your binaries (which are bigger than 
4096 bytes) will be stored separate from the bundle. That means if you 
do nextNode() only the bundle without the binary should be loaded. As 
long as you do not access the binary property with 
node.getProperty(<name>) or node.getProperties() the binary will not be 
loaded. (Please correct me if I'm wrong)
Are you sure that you do not call one of those methods? May be you can 
post some of your code?

> if i set the minimum blob size to a low value the binarys will be loaded only if i try to get the property from
> the jcr:content node is this right ?

Yes, as explained above. But the default value of 4096 should be a good 
fit in most cases.

Cheers,
Christoph


Re: AW: Re: Loading Node without loading Binary Data

Posted by Christoph Kiehl <ck...@sulu3000.de>.
KÖLL Claus wrote:

> as you can see from my second mail i am using the bundle oracle pm.
> the node iterator which i get from a fulltext search has only nodes from type jcr:content. 
> my opinion only to load the parent (where the meta data are stored)  does not really work
> because if i make nextNode() on the iterator the binarys will be load and then it makes no more different to get the
> parent node of the jcr:content node.

If you didn't change the default your binaries (which are bigger than 
4096 bytes) will be stored separate from the bundle. That means if you 
do nextNode() only the bundle without the binary should be loaded. As 
long as you do not access the binary property with 
node.getProperty(<name>) or node.getProperties() the binary will not be 
loaded. (Please correct me if I'm wrong)
Are you sure that you do not call one of those methods? May be you can 
post some of your code?

> if i set the minimum blob size to a low value the binarys will be loaded only if i try to get the property from
> the jcr:content node is this right ?

Yes, as explained above. But the default value of 4096 should be a good 
fit in most cases.

Cheers,
Christoph


AW: Re: Loading Node without loading Binary Data

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi christoph,
as you can see from my second mail i am using the bundle oracle pm.
the node iterator which i get from a fulltext search has only nodes from type jcr:content. 
my opinion only to load the parent (where the meta data are stored)  does not really work
because if i make nextNode() on the iterator the binarys will be load and then it makes no more different to get the
parent node of the jcr:content node.

if i set the minimum blob size to a low value the binarys will be loaded only if i try to get the property from
the jcr:content node is this right ?

BR,
claus


-----Ursprüngliche Nachricht-----
Von: news [mailto:news@sea.gmane.org] Im Auftrag von Christoph Kiehl
Gesendet: Donnerstag, 14. Juni 2007 11:15
An: dev@jackrabbit.apache.org
Betreff: Re: Loading Node without loading Binary Data

KÖLL Claus wrote:
> Is it possible to get a node without loading the binary data ?
> my search is very fast but i have a lot of huge pdf files and if i get a result set with about 20 files
> i only want to get the name and some meta date from the node but not the binarys in the first step.

AFAIK this depends on which persistence manager you use. If you use a 
bundle persistence manager you can configure the minimum blob size. If 
your binary is bigger than that (default is 4096 bytes) it is written 
and read separately from the node state.
If you don't use a bundle persistence manager your properties are 
probably only loaded on demand anyway.
I your case I would guess that binaries are not loaded until you try to 
access that property.
Which persistence manager do you use?

Cheers,
Christoph


Re: Loading Node without loading Binary Data

Posted by Christoph Kiehl <ck...@sulu3000.de>.
KÖLL Claus wrote:
> Is it possible to get a node without loading the binary data ?
> my search is very fast but i have a lot of huge pdf files and if i get a result set with about 20 files
> i only want to get the name and some meta date from the node but not the binarys in the first step.

AFAIK this depends on which persistence manager you use. If you use a 
bundle persistence manager you can configure the minimum blob size. If 
your binary is bigger than that (default is 4096 bytes) it is written 
and read separately from the node state.
If you don't use a bundle persistence manager your properties are 
probably only loaded on demand anyway.
I your case I would guess that binaries are not loaded until you try to 
access that property.
Which persistence manager do you use?

Cheers,
Christoph


AW: Loading Node without loading Binary Data

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
oh sorry .. not thinking before writing ..
i only have to read the parent node not the content node ;-)

BR,
claus

-----Ursprüngliche Nachricht-----
Von: KÖLL Claus [mailto:C.KOELL@TIROL.GV.AT] 
Gesendet: Donnerstag, 14. Juni 2007 11:19
An: dev@jackrabbit.apache.org
Betreff: AW: Loading Node without loading Binary Data

one more information ..
i'm using the new oracle bundle pm .. so the binarys are in the db
and the binary data size can raise up to about 200-300 MB
to read the nodes, it takes very long (up to 30 seconds) 

-----Ursprüngliche Nachricht-----
Von: KÖLL Claus [mailto:C.KOELL@TIROL.GV.AT] 
Gesendet: Donnerstag, 14. Juni 2007 11:05
An: dev@jackrabbit.apache.org
Betreff: Loading Node without loading Binary Data

Is it possible to get a node without loading the binary data ?
my search is very fast but i have a lot of huge pdf files and if i get a result set with about 20 files
i only want to get the name and some meta date from the node but not the binarys in the first step.

is there a solution ?

BR
claus


AW: Loading Node without loading Binary Data

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
one more information ..
i'm using the new oracle bundle pm .. so the binarys are in the db
and the binary data size can raise up to about 200-300 MB
to read the nodes, it takes very long (up to 30 seconds) 

-----Ursprüngliche Nachricht-----
Von: KÖLL Claus [mailto:C.KOELL@TIROL.GV.AT] 
Gesendet: Donnerstag, 14. Juni 2007 11:05
An: dev@jackrabbit.apache.org
Betreff: Loading Node without loading Binary Data

Is it possible to get a node without loading the binary data ?
my search is very fast but i have a lot of huge pdf files and if i get a result set with about 20 files
i only want to get the name and some meta date from the node but not the binarys in the first step.

is there a solution ?

BR
claus