IslandSQL Episode 2: All DML Statements

2023-02-07

Introduction

In the last episode, we build the initial version of IslandSQL. An Island grammar for SQL scripts covering select statements. In this blog post, we extend the grammar to handle the remaining DML statements.

The full source code is available on GitHub and the binaries on Maven Central.

Lexer Changes

The lexer grammar contains a new fragment COMMENT_OR_WS on line 98. We use this fragment in all DML lexer rules after the starting keywords. Why? Because we can use comments beside whitespace after a keyword as in with/*comment*/function e_count.... The previous lexer version required a whitespace after the with keyword for select statements with a plsql_declarations clause.

I also merged the former PLSQL_DECLARATION rule into the SELECT rule. Mainly to have a single lexer rule for all DML statements. It’s more consistent and easier to understand IMO.

IslandSqlLexer.g4: based on https://github.com/IslandSQL/IslandSQL/tree/v0.2.0/

lexer grammar IslandSqlLexer;

options {
    superClass=IslandSqlLexerBase;
    caseInsensitive = true;
}

/*----------------------------------------------------------------------------*/
// Comments and alike to be ignored
/*----------------------------------------------------------------------------*/

ML_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SL_COMMENT: '--' .*? (EOF|SINGLE_NL) -> channel(HIDDEN);

REMARK_COMMAND:
    {isBeginOfCommand()}? 'rem' ('a' ('r' 'k'?)?)?
        (WS SQLPLUS_TEXT*)? SQLPLUS_END -> channel(HIDDEN)
;

PROMPT_COMMAND:
    {isBeginOfCommand()}? 'pro' ('m' ('p' 't'?)?)?
       (WS SQLPLUS_TEXT*)? SQLPLUS_END -> channel(HIDDEN)
;

STRING:
    'n'?
    (
          (['] .*? ['])+
        | ('q' ['] '[' .*? ']' ['])
        | ('q' ['] '(' .*? ')' ['])
        | ('q' ['] '{' .*? '}' ['])
        | ('q' ['] '<' .*? '>' ['])
        | ('q' ['] . {saveQuoteDelimiter1()}? .+? . ['] {checkQuoteDelimiter2()}?)
    ) -> channel(HIDDEN)
;

CONDITIONAL_COMPILATION_DIRECTIVE: '$if' .*? '$end' -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Islands of interest on DEFAULT_CHANNEL
/*----------------------------------------------------------------------------*/

CALL:
    {isBeginOfStatement()}? 'call' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

DELETE:
    {isBeginOfStatement()}? 'delete' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

EXPLAIN_PLAN:
    {isBeginOfStatement()}? 'explain' COMMENT_OR_WS+ 'plan' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

INSERT:
    {isBeginOfStatement()}? 'insert' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

LOCK_TABLE:
    {isBeginOfStatement()}? 'lock' COMMENT_OR_WS+ 'table' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

MERGE:
    {isBeginOfStatement()}? 'merge' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

UPDATE:
    {isBeginOfStatement()}? 'update' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

SELECT:
    {isBeginOfStatement()}?
    (
          ('with' COMMENT_OR_WS+ ('function'|'procedure') SQL_TEXT+? PLSQL_DECLARATION_END)
        | ('with' COMMENT_OR_WS+ SQL_TEXT+? SQL_END)
        | (('(' COMMENT_OR_WS*)* 'select' COMMENT_OR_WS SQL_TEXT+? SQL_END)
    )
;

/*----------------------------------------------------------------------------*/
// Whitespace
/*----------------------------------------------------------------------------*/

WS: [ \t\r\n]+ -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Any other token
/*----------------------------------------------------------------------------*/

ANY_OTHER: . -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Fragments to name expressions and reduce code duplication
/*----------------------------------------------------------------------------*/

fragment SINGLE_NL: '\r'? '\n';
fragment CONTINUE_LINE: '-' [ \t]* SINGLE_NL;
fragment COMMENT_OR_WS: ML_COMMENT|SL_COMMENT|WS;
fragment SQLPLUS_TEXT: (~[\r\n]|CONTINUE_LINE);
fragment SQL_TEXT: (ML_COMMENT|SL_COMMENT|STRING|.);
fragment SLASH_END: SINGLE_NL WS* '/' [ \t]* (EOF|SINGLE_NL);
fragment PLSQL_DECLARATION_END: ';'? [ \t]* (EOF|SLASH_END);
fragment SQL_END:
      EOF
    | (';' [ \t]* SINGLE_NL?)
    | SLASH_END
;
fragment SQLPLUS_END: EOF|SINGLE_NL;

Parser Changes

The start rule file on line 11 in the parser grammar is now defined as a unbounded number of dmlStatment. Each DML statement is a single lexer token. It’s still not possible to produce a parse error with this grammar. We only process DML statements. Everything else is hidden and therefore ignored.

IslandSqlParser.g4: based on https://github.com/IslandSQL/IslandSQL/tree/v0.2.0/

parser grammar IslandSqlParser;

options {
    tokenVocab=IslandSqlLexer;
}

/*----------------------------------------------------------------------------*/
// Start rule
/*----------------------------------------------------------------------------*/

file: dmlStatement* EOF;

/*----------------------------------------------------------------------------*/
// Rules for reduced SQL grammar (islands of interest)
/*----------------------------------------------------------------------------*/

dmlStatement:
      callStatement
    | deleteStatement
    | explainPlanStatement
    | insertStatement
    | lockTableStatement
    | mergeStatement
    | selectStatement
    | updateStatement
;

callStatement: CALL;
deleteStatement: DELETE;
explainPlanStatement: EXPLAIN_PLAN;
insertStatement: INSERT;
lockTableStatement: LOCK_TABLE;
mergeStatement: MERGE;
updateStatement: UPDATE;
selectStatement: SELECT;

IslandSQL for VS Code

The extension for Visual Studio Code version 0.2.0 finds text in all DML statements and is not limited to select statements anymore. And a symbol for each DML statement is shown now in the outline view.

Outlook

A grammar that can parse all DML statements sounds like something complete. However, this grammar is far from complete. For code analysis, getting a single token for a DML statement is at best a good starting point.

What we need is a more detailed result. For this, we need to move the logic from the lexer to the parser. In the next episode, we will do this with one of the DML statements.

1 Comment

IslandSQL Episode 3: Lock Table - Philipp Salvisberg's Blog says:

2023-04-22 at 19:58

[…] the last episode we extended the IslandSQL grammar to cover all DML statements as single lexer token. Now it’s […]

Reply