You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cayenne.apache.org by aa...@apache.org on 2013/02/21 08:03:49 UTC
svn commit: r1448528 - in
/cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx:
customizing-cayenne-runtime.xml performance-tuning.xml
Author: aadamchik
Date: Thu Feb 21 07:03:49 2013
New Revision: 1448528
URL: http://svn.apache.org/r1448528
Log:
docs
performance tuning
(cherry picked from commit 89bca29152afc43442ee3c1f6f1a1659bf17d5f0)
Modified:
cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/customizing-cayenne-runtime.xml
cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/performance-tuning.xml
Modified: cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/customizing-cayenne-runtime.xml
URL: http://svn.apache.org/viewvc/cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/customizing-cayenne-runtime.xml?rev=1448528&r1=1448527&r2=1448528&view=diff
==============================================================================
--- cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/customizing-cayenne-runtime.xml (original)
+++ cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/customizing-cayenne-runtime.xml Thu Feb 21 07:03:49 2013
@@ -184,7 +184,7 @@ ServerRuntime runtime =
Supported property names are listed in "Appendix A".</para>
<para>There are two ways to set service properties. The most obvious one is to pass it
to the JVM with -D flag on startup.
- E.g.<programlisting>java -Dorg.apache.cayenne.sync_contexts=false ...</programlisting></para>
+ E.g.<programlisting>java -Dcayenne.server.contexts_sync_strategy=false ...</programlisting></para>
<para>A second one is to contribute a property to
<code>org.apache.cayenne.configuration.DefaultRuntimeProperties.properties
</code>map (see the next section on how to do that). This map contains the default
Modified: cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/performance-tuning.xml
URL: http://svn.apache.org/viewvc/cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/performance-tuning.xml?rev=1448528&r1=1448527&r2=1448528&view=diff
==============================================================================
--- cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/performance-tuning.xml (original)
+++ cayenne/main/branches/STABLE-3.1/docs/docbook/cayenne-guide/src/docbkx/performance-tuning.xml Thu Feb 21 07:03:49 2013
@@ -7,8 +7,9 @@
<para>Prefetching is a technique that allows to bring back in one query not only the queried
objects, but also objects related to them. In other words it is a controlled eager
relationship resolving mechanism. Prefetching is discussed in the "Performance Tuning"
- chapter, as it is a powerful performance optimization method. Another common application
- of prefetching is for refreshing stale object relationships.</para>
+ chapter, as it is a powerful performance optimization method. However another common
+ application of prefetching is to refresh stale object relationships, so more generally
+ it can be viewed as a technique for managing subsets of the object graph.</para>
<para>Prefetching example:
<programlisting language="java">SelectQuery query = new SelectQuery(Artist.class);
@@ -17,8 +18,8 @@ query.addPrefetch("paintings");
// query is expecuted as usual, but the resulting Artists will have
// their paintings "inflated"
-List<Artist> artists = context.performQuery(query);</programlisting>
- All types of relationships can be preftetched - to-one, to-many, flattened. </para>
+List<Artist> artists = context.performQuery(query);</programlisting>All
+ types of relationships can be preftetched - to-one, to-many, flattened. </para>
<para>A prefetch can span multiple relationships:
<programlisting language="java"> query.addPrefetch("paintings.gallery");</programlisting></para>
<para>A query can have multiple
@@ -86,7 +87,7 @@ query.addPrefetch("paintings").setSemant
</section>
<section xml:id="joint-prefetch-semantics">
<title>Joint Prefetching Semantics</title>
- <para>Joint senantics results in a single SQL statement for root objects and any number
+ <para>Joint semantics results in a single SQL statement for root objects and any number
of jointly prefetched paths. Cayenne processes in memory a cartesian product of the
entities involved, converting it to an object tree. It uses OUTER joins to connect
prefetched entities.</para>
@@ -99,12 +100,120 @@ query.addPrefetch("paintings").setSemant
</section>
<section xml:id="datarows">
<title>Data Rows</title>
+ <para>Converting result set data to Persistent objects and registering these objects in the
+ ObjectContext can be an expensive operation compareable to the time spent running the
+ query (and frequently exceeding it). Internally Cayenne builds the result as a list of
+ DataRows, that are later converted to objects. Skipping the last step and using data in
+ the form of DataRows can significantly increase performance. </para>
+ <para>DataRow is a simply a map of values keyed by their DB column name. It is a ubiqutous
+ representation of DB data used internally by Cayenne. And it can be quite usable as is
+ in the application in many cases. So performance sensitive selects should consider
+ DataRows - it saves memory and CPU cycles. All selecting queries support DataRows
+ option,
+ e.g.:<programlisting language="java">SelectQuery query = new SelectQuery(Artist.class);
+query.setFetchingDataRows(true);
+
+List<DataRow> rows = context.performQuery(query); </programlisting><programlisting language="java">SQLTemplate query = new SQLTemplate(Artist.class, "SELECT * FROM ARTIST");
+query.setFetchingDataRows(true);
+
+List<DataRow> rows = context.performQuery(query);</programlisting></para>
+ <para>Moreover DataRows may be converted to Persistent objects later as needed. So e.g. you
+ may implement some in-memory filtering, only converting a subset of fetched
+ objects:<programlisting language="java">// you need to cast ObjectContext to DataContext to get access to 'objectFromDataRow'
+DataContext dataContext = (DataContext) context;
+
+for(DataRow row : rows) {
+ if(row.get("DATE_OF_BIRTH") != null) {
+ Artist artist = dataContext.objectFromDataRow(Artist.class, row);
+ // do something with Artist...
+ ...
+ }
+}</programlisting></para>
</section>
<section xml:id="iterated-queries">
<title>Iterated Queries</title>
+ <para>While contemporary hardware may easily allow applications to fetch hundreds of
+ thousands or even millions of objects into memory, it doesn't mean this is always a good
+ idea to do so. You can optimize processing of very large result sets with two techniques
+ discussed in this and the following chapter - iterated and paginated queries. </para>
+ <para>Iterated query is not actually a special query. Any selecting query can be executed in
+ iterated mode by the DataContext (like in the previous example, a cast to DataContext is
+ needed). DataContext returns an object called <code>ResultIterator</code> that is backed
+ by an open ResultSet. Data is read from ResultIterator one row at a time until it is
+ exhausted. Data comes as a DataRows regardless of whether the orginating query was
+ configured to fetch DataRows or not. A ResultIterator must be explicitly closed to avoid
+ JDBC resource leak.</para>
+ <para>Iterated query provides constant memory performance for arbitrarily large ResultSets.
+ This is true at least on the Cayenne end, as JDBC driver may still decide to bring the
+ entire ResultSet into the JVM memory. </para>
+ <para>Here is a full
+ example:<programlisting language="java">// you need to cast ObjectContext to DataContext to get access to 'performIteratedQuery'
+DataContext dataContext = (DataContext) context;
+
+// create a regular query
+SelectQuery q = new SelectQuery(Artist.class);
+
+// ResultIterator operations all throw checked CayenneException
+// moreover 'finally' is required to close it
+try {
+
+ ResultIterator it = dataContext.performIteratedQuery(q);
+
+ try {
+ while(it.hasNextRow()) {
+ // normally we'd read a row, process its data, and throw it away
+ // this gives us constant memory performance
+ Map row = (Map) it.nextRow();
+
+ // do something with the row...
+ ...
+ }
+ }
+ finally {
+ it.close();
+ }
+}
+catch(CayenneException e) {
+ e.printStackTrace();
+}
+</programlisting>Also
+ common sense tells us that ResultIterators should be processed and closed as soon as
+ possible to release the DB connection. E.g. storing open iterators between HTTP requests
+ and for unpredictable length of time would quickly exhaust the connection pool.</para>
</section>
<section xml:id="paginated-queries">
<title>Paginated Queries</title>
+ <para>Enabling query pagination allows to load very large result sets in a Java app with
+ very little memory overhead (much smaller than even the DataRows option discussed
+ above). Moreover it is completely transparent to the application - a user gets what
+ appears to be a list of Persistent objects - there's no iterator to close or DataRows to
+ convert to objects:</para>
+ <para>
+ <programlisting language="java">SelectQuery query = new SelectQuery(Artist.class);
+query.setPageSize(50);
+
+// the fact that result is paginated is transparent
+List<Artist> artists = ctxt.performQuery(query);</programlisting>
+ </para>
+ <para>Having said that, DataRows option can be combined with pagination, providing the best
+ of both
+ worlds:<programlisting language="java">SelectQuery query = new SelectQuery(Artist.class);
+query.setPageSize(50);
+query.setFetchingDataRows(true);
+
+List<DataRow> rows = ctxt.performQuery(query);</programlisting></para>
+ <para>The way pagination works internally, it first fetches a list of IDs for the root
+ entity of the query. This is very fast and initially takes very little memory. Then when
+ an object is requested at an arbitrary index in the list, this object and adjacent
+ objects (a "page" of objects that is determined by the query pageSize parameter) are
+ fetched together by ID. Subsequent requests to the objects of this "page" are served
+ from memory.</para>
+ <para>An obvious limitation of pagination is that if you eventually access all objects in
+ the list, the memory use will end up being the same as with no pagination. However it is
+ still a very useful approach. With some lists (e.g. multi-page search results) only a
+ few top objects are normally accessed. At the same time pagination allows to estimate
+ the full list size without fetching all the objects. And again - it is completely
+ transparent and looks like a normal query.</para>
</section>
<section xml:id="caching-and-fresh-data">
<title>Caching and Fresh Data</title>
@@ -117,5 +226,49 @@ query.addPrefetch("paintings").setSemant
</section>
<section xml:id="turning-off-synchronization-of-objectcontexts">
<title>Turning off Synchronization of ObjectContexts</title>
+ <para>By default when a single ObjectContext commits its changes, all other contexts in the
+ same runtime receive an event that contains all the committed changes. This allows them
+ to update their cached object state to match the latest committed data. There are
+ however many problems with this ostensibly helpful feature. In short - it works well in
+ environments with few contexts and in unclustered scenarios, such as single user desktop
+ applications, or simple webapps with only a few users. More specifically:<itemizedlist>
+ <listitem>
+ <para>The performance of synchronization is (probably worse than) O(N) where N
+ is the number of peer ObjectContexts in the system. In a typical webapp N
+ can be quite large. Besides for any given context, due to locking on
+ synchronization, context own performance will depend not only on the queries
+ that it runs, but also on external events that it does not control. This is
+ unacceptable in most situations. </para>
+ </listitem>
+ <listitem>
+ <para>Commit events are untargeted - even contexts that do not hold a given
+ updated object will receive the full event that they will have to
+ process.</para>
+ </listitem>
+ <listitem>
+ <para>Clustering between JVMs doesn't scale - apps with large volumes of commits
+ will quickly saturate the network with events, while most of those will be
+ thrown away on the receiving end as mentioned above.</para>
+ </listitem>
+ <listitem>
+ <para>Some contexts may not want to be refreshed. A refresh in the middle of an
+ operation may lead to unpredictable results. </para>
+ </listitem>
+ <listitem>
+ <para>Synchronization will interfere with optimistic locking. </para>
+ </listitem>
+ </itemizedlist>So we've made a good case for disabling synchronization in most webapps.
+ To do that, set to "false" the following DI property -
+ <code>Constants.SERVER_CONTEXTS_SYNC_PROPERTY</code>, using one of the standard
+ Cayenne DI approaches. E.g. from command
+ line:<programlisting language="java">java -Dcayenne.server.contexts_sync_strategy=false</programlisting>Or
+ by changing the standard properties Map in a custom extensions
+ module:<programlisting language="java">public class MyModule implements Module {
+
+ @Override
+ public void configure(Binder binder) {
+ binder.bindMap(Constants.PROPERTIES_MAP).put(Constants.SERVER_CONTEXTS_SYNC_PROPERTY, "false");
+ }
+}</programlisting></para>
</section>
</chapter>