Featured image of post DB2 Battle: Optimization Profile vs. Statement Concentrator (Part 1)

DB2 Battle: Optimization Profile vs. Statement Concentrator (Part 1)

DB2 explain output

Today I wanted to try using a DB2 Optimization Profile for a statement impacted by the DB2 Statement Concentrator. It turned out to be a longer exercise than thought, but also with more fun, covering a lot of the DB2 tuning and monitoring infrastructure. So set some minutes aside and follow my journey into the world of query optimization, workload monitoring, session actuals, and more. I split it into two parts because of length (Update on 02-Jul-2015: A third part is here).

The original question I tried to answer was related to REOPT and a query like this:

select id, s 
from betw 
where id between ? and ?

Is it possible to force an application to always reoptimize a statement, so that depending on the given range either an index scan or table scan is used. DB2 has the concept of multi-phase statement compilation, optimization, and execution. The individual steps are briefly explained in the introduction to this article on improving query performance by using REOPT bind option. Using that option it is possible to have (complex) statements re-optimized before every execution, e.g., to address different values in input variables.

The mentioned statement concentrator is a means for reducing query compilation and optimization effort by replacing literals in similar statements by parameter markers. It can have lots of benefits, but also could cause suboptimal access plans if values are not distributed evenly.

For my tests I used a JDBC-based SQL GUI to send statements like “explain plan for select id, s from betw where id between 20 and 30” to DB2, changing the upper bound to much higher values. Without the statement concentrator turned on this resulted in access plans with either an index scan or table scan. Here is such an index-based access plan:

Access Plan:
-----------
    Total Cost:         20.3338
    Query Degree:        1

            Rows 
           RETURN
           (   1)
            Cost 
             I/O 
             |
           11.996 
           FETCH 
           (   2)
           20.3338 
              3 
         /---+----\
     11.996        98304 
     IXSCAN   TABLE: HLOESER 
     (   3)        BETW
     13.5606        Q1
        2 
       |
      98304 
 INDEX: HLOESER 
     BETWIX
       Q1

With the statement concentrator turned on, all statements variations are transformed into a statement with variables. In the explain output the statements that have been rewritten by an active statement concentrator can be identified because of an extra section labeled “effective statement”:

Effective Statement:
-------------------
select 
  id,
  s 
from 
  betw 
where 
  id between :L0 and :L1

After getting this far I wanted to apply an optimization profile to cause the specific statement to be always reoptimized. Optimization profiles provide many ways for performance tuning on a, at least sometimes, low-level basis. They are less intrusive than database hints (see Oracle-style hints in DB2 :), can be turned on and off on different levels, and provide all kinds of options. The first step to use profiles is to prepare the DB2 tools catalog for them:

db2 "call sysinstallobjects('opt_profiles', 'c', '', '')" 

An optimization profile is created by writing an XML document according to the XML schema found here. I created the following profile:

 <?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE>
  <STMTMATCH EXACT='FALSE'/>
  <STMTPROFILE ID="REOPT betw">
     <STMTKEY>
        <![CDATA[select id,s from betw where id between :L0 and :L1 ]]>
     </STMTKEY>
     <OPTGUIDELINES>
       <REOPT VALUE="ALWAYS" />
     </OPTGUIDELINES>
  </STMTPROFILE>
</OPTPROFILE>

The optimization profile consists of a single so-called statement profile which is labeled “REOPT betw”. It is for our SQL statement and I decided to use the version as seen in the effective statement. As a guideline to the optimizer the “REOPT ALWAYS” is passed. What I included in the profile is the switch to turn on inexact statement matching. It causes DB2 to consider more variations of the SQL statement for optimization by our rule. Once the optimization profile is complete, it can be inserted into the catalog table. For that I created a file “betw.csv” with a profile name and reference to the XML file:

"HLOESER","PROFILE_BETW","betw.xml"

That file is then imported into DB2:

db2 "import from betw.csv of del modified by lobsinfile insert_update into systools.opt_profile"

Whenever profiles are changed, it is a good idea to flush the cache:

db2 flush optimization profile cache

Usage of an optimization profile can be enabled in different ways, e.g., in an application by setting a registry variable like this:

set current optimization profile="HLOESER"."PROFILE_BETW"

Once this is done there is feedback in the DB2 explain output on whether profiles are in use and if yes, which one. This could be in the form of diagnostic information pointing to errors as shown below or in a section “profile information” that gives details about the name of the profile and the statement identifier:

A problem is reported

Extended Diagnostic Information:

--------------------------------

Diagnostic Identifier:     1
Diagnostic Details:     EXP0005W  The optimization profile could not be
            processed or found in the OPT_PROFILE table in the
            SYSTOOLS schema.

Diagnostic Identifier:     2
Diagnostic Details:     EXP0001W  An unexpected error occurred processing
            or applying an optimization profile or embedded
            optimization guideline. Line number "0", character
            number "0".

Success, the profile is in use

Profile Information:
--------------------
OPT_PROF: (Optimization Profile Name)
    HLOESER.PROFILE_BETW
STMTPROF: (Statement Profile Name)

    REOPT betw

So far, so good. An optimization profile has been created and depending on the which statement is executed by DB2, the explain output shows that my statement level guideline has been considered. To test whether the guideline is really applied, I executed several statements like “explain plan for select id, s from betw where id between 20 and x”. X was in the range of 21 to 20000. With the statement concentrator on the explain output showed that all statements ended up in the following optimized statement.

SELECT 
  Q3.ID AS "ID",
  Q3.S AS "S" 
FROM 
  (SELECT 
     Q1.$C0 
   FROM 
     (VALUES 
        0) AS Q1 
   WHERE 
     (:L0 <= :L1 SELECTIVITY 1.000000)
  ) AS Q2,
  HLOESER.BETW AS Q3 
WHERE 
  (Q3.ID <= :L1) AND 
  (:L0 <= Q3.ID)

All statements had the same complex access plan:

Access Plan:
-----------
    Total Cost:         20.469
    Query Degree:        0
                Rows 
             Rows Actual
               RETURN
               (   1)
                Cost 
                 I/O 
                 |
               167.205 
                 19 
               NLJOIN
               (   2)
               20.469 
                 NA 
          /------+-------\
         1               167.205 
        NA                 19 
      TBSCAN             FETCH 
      (   3)             (   4)
    0.000186812          20.4688 
        NA                 NA 
        |              /---+----\
         0         167.205       98304 
        NA           19           NA 
 TABFNC: SYSIBM    IXSCAN   TABLE: HLOESER 
      GENROW       (   5)        BETW
        Q1         13.6489        Q3
                     NA 
                     |
                    98304 
                     NA 
               INDEX: HLOESER 
                   BETWIX
                     Q3

Did my guideline work? I wasn’t sure so this called for deeper look at the optimization. Read on here for the second part of the journey.