Parallel PL/SQL launcher update

A while ago I wrote an experimental function to submit up to four PL/SQL commands simultaneously using Oracle's Parallel Query capabilities and a pipelined function. The results were promising but somewhat frustrating, as I could only get two parallel threads in my 11g test setup, and I couldn't figure out why.

I still don't have the full picture (perhaps one or more of the trace events will light some shed - 10975 trace execution of parallel propagation, 10383 auto parallelization testing event and 10384 parallel dataflow scheduler tracing look promising, amongst others) but I can now get all four parallel threads on my single-CPU 11g test system (11.1.0.6.0 Enterprise Edition on Windows XP Pro, in a Parallels VM on my 2GHz iMac, to be precise).

Finding myself with some spare time, I had another look at my initialisation parameters, and I noticed CPU_COUNT was set to 1 (which was true, there is only 1 cpu) and PARALLEL_THREADS_PER_CPU was 2 (the default setting). I found that setting PARALLEL_THREADS_PER_CPU = 4 (or CPU_COUNT = 2) enabled the full 4 parallel threads requested in parallel_launch.test(). Also, since gathering statistics on the PQ_DRIVER table had the effect of serialising queries against it, and since by default the background stats job will do this, I added a call to dbms_stats.lock_table_stats(user,'pq_driver'). I still don't know what statistics cause this to happen and why, and it's a little puzzling because I would have expected dynamic sampling (enabled by default since 10g) to give the same results.

Also I noticed a potential bug in the pq_submit function. It queries v$px_session as a self-diagnostic check, but if for any reason no parallel slaves were used there will be no rows in it, leading to a no_data_found exception because I used a SELECT INTO construction. Since no_data_found exceptions are automatically handled by the SQL engine, my pq_submit might appear to return successfully after mysteriously doing nothing. That's now fixed.

Parallel PL/SQL launcher

This is an experimental package for submitting PL/SQL calls in parallel using the Oracle SQL Parallel Query mechanism.

Many have complained that PL/SQL has no multi-threading support, as Java (for example) does. The conventional solution is to use DBMS_JOB or DBMS_SCHEDULER, and if necessary write your own wrapper to handle completion status reporting (Has it finished yet? Did it succeed?) - for an example of this approach, see job_ot.

However, SQL does have a parallel threading mechanism, in the form of Parallel Query (PQ). You simply execute your query using a PARALLEL() hint, or against tables with a parallel degree defined - particularly effective with partitioned tables, where partitions are divided among available PQ slave processes depending on resources available, and the results are combined at the end. You can think of this as a PQ controller dividing up the job, handing out tasks to PQ slaves, and collating the results. Meanwhile over in PL/SQL, you can define parallel-enabled pipelined functions that accept a ref cursor. The PQ controller decides how to divide up the ref cursor among available PQ slaves and passes them one function call each, thus invoking multiple instances of the function to handle the cursor.

Now, if you create a table "PQ_DRIVER" with 4 partitions and a parallel degree of 4, with one row in each partition, and you execute a query along the lines of SELECT /*+ PARALLEL(pq,4) */ thread_id FROM pq_driver pq; that should persuade the PQ controller to set four PQ slave processes to work on it (one per partition). And if you pass that query as a cursor parameter to a parallel-enabled pipelined function, then shouldn't that create a situation where each row is processed by a separate PQ slave process? So here is a way to use (alright, hack) the PQ engine so that it processes arbitrary PL/SQL procedure calls in parallel.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> set timing on
SQL> exec exec parallel_launch.test
sid=151 serial#=39222: Degree of parallelism: requested 4, actual 4: 20:04:16 - 20:04:18
sid=137 serial#=27074: Degree of parallelism: requested 4, actual 4: 20:04:16 - 20:04:18
sid=146 serial#=33830: Degree of parallelism: requested 4, actual 4: 20:04:16 - 20:04:18
sid=132 serial#=11582: Degree of parallelism: requested 4, actual 4: 20:04:16 - 20:04:18

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.55


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> set timing on
SQL> exec parallel_launch.test
sid=123 serial#=21233: Degree of parallelism: requested 4, actual 2: 06:07:47 - 06:07:49
sid=123 serial#=21233: Degree of parallelism: requested 4, actual 2: 06:07:49 - 06:07:51
sid=127 serial#=21506: Degree of parallelism: requested 4, actual 2: 06:07:47 - 06:07:49
sid=127 serial#=21506: Degree of parallelism: requested 4, actual 2: 06:07:49 - 06:07:51

PL/SQL procedure successfully completed.

Elapsed: 00:00:06.83

The degree of parallelism is lower in 11g than in 10.2 - possibly 11g is more intelligent and decides that four slaves would be excessive for a table with only four rows, especially in my single-processor RAM-starved desktop VM.

I also tested in 10.1.0.3.0:

sid=439 serial#=58467:   Degree of parallelism: requested 4, actual 4: 16:18:57 - 16:18:59
sid=439 serial#=58467: Degree of parallelism: requested 4, actual 4: 16:18:59 - 16:19:01
sid=439 serial#=58467: Degree of parallelism: requested 4, actual 4: 16:19:01 - 16:19:03
sid=439 serial#=58467: Degree of parallelism: requested 4, actual 4: 16:19:03 - 16:19:05

I got the four PQ slaves I asked for, but everything was executed by just one of them. What this is really telling me is that I don't fully understand PQ.

This has not been stress tested and I make no promises that it will work. Let me know what you find.

Sudoku solver 10g

Here is a 10g update of last year's 9i Sudoku solver. The row/column/square free values intersection code is simplified by the new MULTISET INTERSECT operator, the validation procedure now uses IS A SET, and I've used MEMBER OF in the new "cross hatching" algorithm. I also used some conditional compilation for diagnostic output.

My original idea was to find the intersection of the three sets of free values for each cell - that is, the available values for the row, the column and the box. I thought this was pretty neat until I looked up Sudoku-solving and found that it was called "Candidate Elimination" (or "Forced Move") and regarded as the most basic approach used by beginners on simple puzzles.

I built on this by adding a guessing loop, in which if the elimination approach cannot complete the puzzle, it takes each candidate value for each empty cell in turn, plugs that value in and retries the eliminator process. It turns out this is another standard technique, known variously as "What If", "Guess-and-Check", "Bifurcation", "Backtracking" or "Ariadne's thread".

Version adds a second deduction approach, in which a possible value for a cell is checked to see if there is anywhere else it could go in its row. If not, then it must go in that cell. This seems to be known as "Cross hatching" within the Sudoku community.

Updated: version 4 now checks the current column and square, and can also eliminate values using a "what if" test, along with some internal refactoring I won't bore you with.

The next straightforward algorithm I want to add in a future version is "matched groups", described in Wikipedia as follows:

One method works by identifying "matched cell groups". For instance, if precisely two cells within a scope (a particular row, column, or region) contain the same two candidate numerals (p,q), or if precisely three cells within a scope contain the same three candidate numerals (p,q,r), these cells are said to be matched. The placement of those candidate numerals anywhere else within that same scope would make a solution impossible; therefore, those candidate numerals can be deleted from all other cells in the scope.


I also want to add what the Brainbashers page calls Intersection Removal, in which for example you narrow down the location of the 5 within a box to one of two positions, and these fall within the same row or column: now even though you don't know exactly which position it goes in, you still have enough information to exclude the rest of that row or column.

Now possibly by this point the code will be sufficiently complex that it would be simpler just to focus on a brute-force "check every possible combination" approach, but that seems to me to defeat the purpose - I wanted a solution using logical deduction.

I should mention that the names in use for sudoku solving techniques seem to vary widely, and I originally made up some of my own before reading up on the subject. The excellent Brainbashers Sudoku Help site, for example, uses "Intersections" and "Pinned Squares" for what I call Locations (there is only one place for the 7 to go), "Forced Moves" for what I call Intersection (the only candidate value for the cell is 2), and "Locked Sets" for what Wikipedia (see above) calls "Matched Groups" (actually I quite like this term and might use it myself). I also realise that I've used the word "square" where others have used "region" or "box".

To install using SQL*Plus, either download sudoku-mk4.pls and run it from the SQL> prompt, or (if you trust me!) paste the following command directly into SQL*Plus:

@http://www.williamrobertson.net/code/sudoku_mk4.pls

To reinstall over an existing version, add the argument "rebuild":

@http://www.williamrobertson.net/code/sudoku_mk4.pls rebuild

This will cleanly drop all sudoku_* types, avoiding type dependency issues.

DBMS_ADVISOR tuning advice

One new feature in 10g is DBMS_ADVISOR, which analyses a specified query and recommend ways to improve performance. This script analyzes whatever query is in the SQL*Plus buffer, so when testing a query you can simply enter

@tune

SQL*Plus Index report

Report indexes for the specified table. Specify an index name to see details about that index (which table, what columns). Allows user.table, or user.index.

Sudoku solver

The idea behind this solution is to find the intersect set of {row.freelist, column.freelist, square.freelist}. If that gives only one value, that must be the value of the cell. Now if row, column and square could all be instances of a Sudoku Element object type, and that had a free_list() method, it shouldn't be too hard to write...

Utility assortment

ECHO (word-wrapper for DBMS_OUTPUT), TO_DECIMAL, TO_BASE (number conversion functions), SEND_MAIL (HTML formatted email), LIST_ELEMENT (string tokeniser), SPLIT, TO_STRING etc, all in one rather arbitrary package.

Exception object

An Exception object you can pass around between procedures. It was a bit of an experiment and I haven't found a real use for it yet, but maybe that's just because I've been working on warehouse systems for a while. Let me know if you find a use for it!

Job control object

Job control modelled on Unix: declare "job" objects, submit them, wait for them to complete. Uses DBMS_JOB to submit tasks, DBMS_ALERT for jobs to pass back status and error messages, so the calling procedure can easily track the background jobs it started.

Timer object

Timer object to simplify the routine task of capturing start and end times and displaying formatted results. Includes "stop", "resume", "restart", "elapsed" and "show" methods.

AVG and SUM for INTERVALs

Presumably due to an oversight, Oracle have not yet overloaded the SUM and AVG functions to support the INTERVAL DAY TO SECOND datatype. Here are my AVG_DSINTERVAL and SUM_DSINTERVAL aggregate/analytic functions.

Explain Plan Utility

My handy Explain Plan script for SQL*Plus. Just type:
@xplan

Which

Resolves an object name following the same resolution path as SQL or PL/SQL:
e.g. given the name "EMP", figure out that it's a public synonym for SCOTT.EMPLOYEES.

Usage - use any of the following:
@which emp
@which scott.emp
@which em%

The report is in two parts. First it queries DBA_OBJECTS (or if that is not accessible it will automaticallly switch to ALL_OBJECTS) for any matching objects. The results include whether there is a synonym and whether you have been granted access. Then it calls DBMS_UTILITY.NAME_RESOLVE, which prior to 9i resolved the names of stored code (packages, procedures etc) but in recent versions has been extended to other database objects such as tables.

If you use a wildcard, the report will stop after querying DBA_OBJECTS.

It is possible for an object to appear in the first part of the results because it was found in DBA_OBJECTS, but not actually be resolvable, for example because it resides in another schema and you are missing a synonym.

My sessions

A variation of sessions.sql, but limited to the current OS user account. Handy when there are a lot of sessions and you only want to see your own. I should probably write a generic one that takes parameters...

Sessions

List all database sessions. Handy when you don't have TOAD, PL/SQL Developer etc around.

Dict

Find Oracle data dictionary information, grouping the results into USER_/ALL_/DBA_/other. e.g. "@dict dict" finds DICT, DICTIONARY, DICT_COLUMNS, GV$LOGMNR_DICTIONARY and V$LOGMNR_DICTIONARY together with their descriptions.

Deps

Dependency report for specified object (table, package etc) using DBA_DEPENDENCIES view.

Constr

Database constraints report, for the specified table, e.g. "@constr emp". Requires three stored types, which are included (commented out) within the script.

Cardinality

Reports on how unique a column is. For example, 90% of its values might be unique, with the remaining duplicates appearing in groups of twos and threes. This is useful when investigating a system that doesn't have a full set of unique constraints, or just for analyzing data in general. (I've also implemented this as a PL/SQL Developer right-click option using Browser Extender.)

Count

Shows the distinct values of [column] within [table], e.g. "@count object_type all_objects", which is the same as "SELECT object_type, COUNT(*) FROM all_objects GROUP BY object_type", but less typing.

Most

Find the value that occurs most often in a column. For example, "@most object_type all_objects" reports "VIEW" and the number of occurrences; or "@most object_type all_objects 5" to see the top 5 object types.

Percent

Report on the population of [column] within [table]. Shows you how many null values exist in a column.

Biggest

Specify a table and a column name, to find the longest value held in that column. Can be useful when analyzing usage patterns or diagnosing "value too large" errors.

Subscribe to newsfeed (Atom format)

This page is powered by Blogger. Isn't yours?