Note This document is still a “work in progress”!
Parser provides following functionality:
Tokeniser takes a string as input and then brakes it up into tokens: identifiers, operators etc.
TODO: more details required.
Parser is used to group given tokens into statements and their parts in hierarchical order. All tokens matching reserved words are marked as “Keywords”.
Parser is used for a number of purposes:
Note: As purpose of parser is limited in TOra (it is not required to check if given statement is correct) it is only doing approximate parsing!
There are a number of type of statements identified by the parser (as described in tosqlparse.h):
Tip: if you want to check what is a structure of parsed text you can use printstatement function (in tosqlparse.cpp). Results of that function are used in example below.
Each statement will also have it's class identified. Class information is used later then passing information to Oracle. For example trailing semicolon must be removed for ddl/dml statements while plsql blocks must have this trailing semicolon. Therefore following classes are identified in TOra:
The simplest statement example would be:
select sysdate from dual;
This one would be parsed as one statement having five sub-tokens.
Statement:
Keyword: select
Keyword: sysdate
Keyword: from
Token: dual
Token: ;
As you can see parser has not only identified all of this as one statement, it has also marked keywords and left all other tokens as simple “tokens” (this information would later be used in code formatting).
If there are two simplest statements:
select sysdate from dual; select sysdate from dual;
Parser would identify two statements consisting of similar sub-tokens:
Statement:
Keyword: select
Keyword: sysdate
Keyword: from
Token: dual
Token: ;
Statement:
Keyword: select
Keyword: sysdate
Keyword: from
Token: dual
Token: ;
Let's parse a simple DDL statement containing some lists:
create table test(col varchar(12));
Is parsed like this:
Block:
Statement:
Keyword:create
Keyword:table
Token:test
Token:(
List:
Token:col
Keyword:varchar
Token:(
List:
Token:12
Token:)
Statement:
Token:)
Token:;
Note! lists could contain other inner lists!
Statement with PL/SQL statement:
CREATE OR REPLACE PROCEDURE A AS
BEGIN
CASE a
WHEN 1 THEN NULL;
WHEN 2 THEN NULL;
ELSE NULL;
END CASE;
END;
Is parsed:
Block:
Statement:
Keyword:CREATE
Keyword:OR
Keyword:REPLACE
Keyword:PROCEDURE
Token:A
Keyword:AS
Statement:
Keyword:BEGIN
Block:
Statement:
Keyword:CASE
Token:a
Keyword:WHEN
Token:1
Keyword:THEN
Statement:
Token:NULL
Token:;
Statement:
Keyword:WHEN
Token:2
Keyword:THEN
Statement:
Token:NULL
Token:;
Statement:
Keyword:ELSE
Statement:
Token:NULL
Token:;
Statement:
Token:END
Keyword:CASE
Token:;
Statement:
Token:END
Token:;
Indentation uses parsers result. It is important for indentation that parser correctly recognises statements, blocks, keywords etc.
Comment is added first (if it was attached to the statement). Loop through sub-statements. Same indent functionality is called recursively for each statement.
List and statement types are processed similarly.
Parser functionality is used in many parts of TOra. Small changes can fix some problems and at the same time brake a lot of other stuff. In order to increase reliability and decreasing regression testing time unit tests are used.
Currently unit test code can be found in tosqlparse.cpp function main. Therefore if you want to compile TOra to run unit tests (instead launching application itself) you have to take out main.cpp from cmake. Easiest (but not optimal) way to do that is to comment out main.cpp from src/CMakeLists.txt and uncomment (tosqlparsertest.cpp).
Unit test will perform following tests: