Introduction
In the last episode, we extended the IslandSQL grammar covering the complete lock table
statement. However, the support for expressions was very limited. It was not possible to use a date literal or to_date
function to determine a partition to be locked. Time to fix that. In this episode, we will have a closer look at some SQL expressions and how to deal with the complexity of the SQL language.
The full source code is available on GitHub and the binaries are on Maven Central.
Lock Table Partition
To lock a table partition we use the partition name like this:
lock table sales partition (sales_q2_2000) in exclusive mode nowait;
Or we let the Oracle Database determine the partition by passing the values of a partition key like this:
lock table sales partition for (date '2000-04-01') in exclusive mode nowait;
Both statements will lock the same partition of the table sales
. The latter is IMO better since it works also with system-generated partition names, which you might use with interval partitioning.
There are a lot of possibilities to specify the date. Here is a selection of sensible and less sensible alternatives:
lock table sales partition for (to_date('2000-04', 'YYYY-MM')) in exclusive mode nowait;
lock table sales partition for (to_date('2000-092', 'YYYY-DDD')) in exclusive mode nowait;
lock table sales partition for (timestamp '2000-04-01 08:42:42') in exclusive mode nowait;
lock table sales partition for (add_months(trunc(to_date('2000-12-31', 'YYYY-MM-DD'), 'YYYY'), 3)) in exclusive mode nowait;
lock table sales partition for (date '2000-01-01' + interval '3' month) in exclusive mode nowait;
lock table sales partition for (timestamp '2000-12-31 18:00:00' + 1/4 + -9 * interval '1' month) in exclusive mode nowait;
All these alternatives lock the partition in the sales
table where the data for 1st April 2000 is stored. Using different SQL expressions, of course.
Expressions
I mentioned in my last post that expressions are the most extensive part of the SQL grammar. However, there are ways to optimise the extent of the grammar. ANTLR 4 helps because it allows defining left recursive parts of a grammar naturally. See the expressions labelled with binaryExpression
in the excerpt of the parser in version 0.4.0 of the grammar below.
expression:
expr=STRING # simpleExpressionStringLiteral
| expr=NUMBER # simpleExpressionNumberLiteral
| K_DATE expr=STRING # dateLiteral
| K_TIMESTAMP expr=STRING # timestampLiteral
| expr=intervalExpression # intervalLiteral
| expr=sqlName # simpleExpressionName
| LPAR exprs+=expression (COMMA exprs+=expression)* RPAR # expressionList
| expr=caseExpression # caseExpr
| operator=unaryOperator expr=expression # unaryExpression
| expr=functionExpression # functionExpr
| expr=AST # allColumnWildcardExpression
| left=expression operator=(AST|SOL) right=expression # binaryExpression
| left=expression
(
operator=PLUS
| operator=MINUS
| operator=VERBAR VERBAR
)
right=expression # binaryExpression
| left=expression operator=K_COLLATE right=expression # binaryExpression
| left=expression operator=PERIOD right=expression # binaryExpression
;
In line 13 we deal with multiplication and division. Both sides of the operation allow an expression. Unlike grammars based on ANTLR 3, it is no longer necessary to left-factor this left-recursion in ANTLR 4.
ANTLR 4 solves the ambiguity by prioritizing the alternatives in the order of the definition. As a result, multiplication/division has a higher priority than addition/subtraction and a lower priority than the function expression on line 11.
Left-factoring still might be helpful to optimize the runtime performance of the parser. However, it’s not necessary anymore. And I’m happy with a simpler grammar.
Function Expressions
It would be a bad idea to handle each function like to_date
or to_char
separately in the grammar. Why? It would be more work and we still would need a solution for custom functions. As a result, we generically implement functions. The most naïve grammar definition would look like this:
functionExpression:
name=sqlName LPAR (params+=expression (COMMA params+=expression)*)? RPAR
;
Parameterless functions that do not allow parentheses like sysdate
will be treated as simpleExpressionName
. This rule handles parameterless functions that require parentheses like sys_guid()
. And it can handle functions with an unbounded number of parameters.
So what’s the problem? Why do I call this definition “naïve”? Because the following cases are not covered:
- named parameters, e.g.
dbms_utility.get_hash_value(name => 'text1', base => 0, hash_size => 16)
- parameter prefixes, e.g.
distinct
orall
in any_value - parameter suffixes, e.g.
deterministic
in approx_median or partition_by_clause/order_by_clause in approx_rank - partial analytic clauses, e.g.
within group
in approx_percentile - analytic clauses, e.g.
over
in avg
In most cases, it makes probably sense to handle them generically. However, there are cases where it’s probably better to define dedicated grammar rules for very special functions such as json_table or xml_table.
You find a less naïve implementation on GitHub. However, it is still incomplete. IMO a good way to assess completeness is to write tests. For function expressions, this means tests for every single function according to the SQL Language Reference. My tests currently cover abs
through cardinality
. The next function on the to-do list is cast, which contains the following uncovered grammar constructs:
- parameter prefix
multiset
for a subquery expression, which is also not yet covered - parameter suffix
as type_name
- parameter suffix
DEFAULT return_value ON CONVERSION ERROR
Shall we define a dedicated grammar rule for the cast
function? I guess yes. However, I’d probably implement multiset
as a unary operator and the subquery
as part of the expression
rule. For that, we need to implement the grammar for the complete select
statement.
Knowing the Scope and the Limitations
A typical question I get is “Why are you writing a SQL grammar? There are already some available for ANTLR 4, right?”. That’s an excellent question. The ANLTR organisation on GitHub manages a repository with example grammars. You find the ones for SQL here.
Most of the 3rd party grammars cover just a (large) subset of the underlying languages. They define what they cover in ANTLR or by EBNF. However, they often do not define which versions they cover and they do not define what they don’t cover. As a result, you have to try if the grammar is sufficient for your use case. Furthermore, you have to assess if it is sufficient for future use cases and if it will cover the changes in newer versions. And of course, you have to decide if you are ready to extend/fix the grammar in areas where it does not meet your expectations. You will have to maintain a fork. This can become painful, especially if the grammar contains a lot of stuff you are not interested in and if the existing test suites are incomplete.
Furthermore, I’m not aware of an open-sourced grammar that covers the relevant portions for the Oracle Database and PostgreSQL. Yes, the goal is that IslandSQL supports the recent versions of both dialects as if they were a single dialect.
We develop the grammar iteratively. As a result, there are a lot of interim limitations like the one mentioned regarding the cast
function.
SQL*Plus Substitution Variables
However, some limitations are supposed to be permanent. Like the one for substitution variables. Substitution variables can contain arbitrary text. They are replaced before the execution of a script. The IslandSQL grammar provides limited support for substitution variables. They can be used in places where a sqlName
is valid. This is basically everywhere you can use an expression
.
Here’s an example of a supported usage:
lock table &table_name in exclusive mode wait &seconds;
And here’s an example of an unsupported usage:
lock table dept in &lock_mode mode nowait;
The grammar expects certain keywords at the position of &lock_mode
. Here’s the excerpt of the grammar that should make that clear:
lockTableStatementUnterminated:
K_LOCK K_TABLE objects+=lockTableObject (COMMA objects+=lockTableObject)*
K_IN lockmode=lockMode K_MODE waitOption=lockTableWaitOption?
;
lockMode:
K_ROW K_SHARE # rowShareLockMode
| K_ROW K_EXCLUSIVE # rowExclusiveLockMode
| K_SHARE K_UPDATE # shareUpdateLockMode
| K_SHARE # shareLockMode
| K_SHARE K_ROW K_EXCLUSIVE # shareRowExclusiveLockMode
| K_EXCLUSIVE # exclusiveLockMode
;
And the next excerpt shows how substitution variables are defined in the grammar.
sqlName:
unquotedId
| QUOTED_ID
| substitionVariable
;
substitionVariable:
AMP AMP? name=substitionVariableName period=PERIOD?
;
substitionVariableName:
NUMBER
| sqlName
;
A substitution variable starts with one or two &
. The name can be either a NUMBER
(like 1
) or a sqlName
(like tableName
). And a substitution variable can optionally end on .
. That’s it. This way we can provide limited support for SQL*Plus substitution variables.
Outlook
The grammar evolves quite nicely. However, expressions are still incomplete. This will be covered with the full support of the select
statement.
As I’m sure you’ve already found out for yourself, the version of the grammar matches the episodes in this blog post series. And these are the planned versions of IslandSQL with their main features:
- v0.5.0: Fully parse
select
statement, complete expressions and conditions - v0.6.0: Fully parse remaining DML statements (
call
,delete
,explain plan
,insert
,merge
,update
) - v0.7.0: PostgreSQL syntax compatibility of implemented statements
- v0.8.0: Fully parse PL/SQL block
- v0.9.0: Fully parse
create
statements of the Oracle Database (function
,package
,procedure
,trigger
,type
,view
) - v0.10.0: pgPL/SQL syntax compatibility (
sql
andplpgsql
language increate function
,create procedure
,create trigger
anddo
)
And after episode 10 the fun begins. We can start to provide value for database developers and others. A linter is one option. But there is more. Stay tuned.
2 Comments
Super great project and my respect, Philipp!
Thanks, for the source deployment in GitHub and Maven and I also like the goals.
Good luck, Friedhold.
Thank you, Friedhold. There is still a lot of work ahead. Cheers, Philipp