Eine linksrekursive Grammatik im rekursiven Abstieg parsen

Posted on April 17, 2014 by bierpilot

Zur Zeit zerbreche ich mir den Kopf darüber, wie man eine linksrekursive Grammatik mit einem rekursiv-absteigenden Parser parsen kann.

Darauf gekommen bin ich durch das Open-Source-Projekt RpaTk von Martin Stoilov – ein im übrigen sehr geniales Projekt. Es enthält einen grundlegend anderen Ansatz als meine bisherigen Entwicklungen, nämlich einen rekursiv-absteigenden Parser, der Backtracking verwendet und noch einige Geschwindigkeitsoptimierungen durch Caching. Das alles läuft dann auch noch auf Basis einer virtuellen Maschine, die ziemlich rasantes Parsing ermöglicht. Ein sehr durchdachtes Konzept, auch wenn es in direkter Konkurrenz zu den letzten Entwicklungen in der libphorward steht, und dieser auch meilenweit voraus ist – Hut ab. Habe mit dem Autor auch Kontakt aufgenommen, weil ich es einfach verstehen will, wie man mit einem rekursiv-absteigenden Parser eine linksrekursive Grammatik parst.

Nehmen wir mal als Beispiel die Grammatik

s: e
e: e + X
e: e - X
e: X

s: e

e: e + X

e: e - X

e: X

und würden diese als plumpen rekursiven Abstieg umsetzen, so wird er noch vor dem lesen des ersten Zeichens in eine Endlosschleife verfallen, weil sich e rekursiv aufruft.

Continue reading →

Emscripten

Posted on February 26, 2014 by bierpilot

Ich hab mir jetzt mal Emscripten angeschaut… coole Sache!

Man compiliert C/C++-Code über das LLVM compiler framework in JavaScript code, welcher dann quasi plattformunabhängig agiert.

libphorward über clang und emcc => Siehe commit log.

nerdfroi

Posted on February 17, 2014 by bierpilot

Da sich ja Dennis beschwert hat, das hier nix mehr gepostet wird… ein kleines *nerdfroi*!

--- AST visualization ---
termdef
 @ = >@< w><
 IDENT = >ident< w><
 REGEX = >/A-Za-z+/< w><
nontermdef
 IDENT = >start< w><
 COLON = >:< w><
 rhs
  symbol
   @ = >@< w><
   IDENT = >ident< w><
nontermdef
 IDENT = >test< w><
 COLON = >:< w><
 rhs
  symbol
   IDENT = >start< w><
  symbol
   @ = >@< w><
   IDENT = >ident< w><

--- AST visualization ---

termdef

@ = >@< w><

IDENT = >ident< w><

REGEX = >/A-Za-z+/< w><

nontermdef

IDENT = >start< w><

COLON = >:< w><

rhs

symbol

@ = >@< w><

IDENT = >ident< w><

nontermdef

IDENT = >test< w><

COLON = >:< w><

rhs

symbol

IDENT = >start< w><

symbol

@ = >@< w><

IDENT = >ident< w><

Musste den ganzen AST-Generator nochmal umbauen. Jetzt tut er so wie er soll, siehe da.

Das war einiges an “Fummeley”, wie der Mittelalternerd jetzt sagen würde…

libphorward kann ASTs

Posted on January 24, 2014 by bierpilot

Jawoll ja!

Nachdem ich die letzten Tage es geschafft habe, die libphorward und die pggrammar-Erweiterung zumindest skizziert in einen Zustand zu bringen, der letztendlich erreicht werden soll, bin ich heute auch noch dort ziemlich weit gekommen.

Zumindest kann die libphorward jetzt sogar einen abstract syntax tree sowie schon seit vorvorgestern einen syntax tree aus einer geparsten Eingabe generieren!

Demo-Programm:

#include <phorward.h>

int main()
{
	pggrammar*		g;
	pgparser*		p;
	pgparser*		p2;

	pgterminal*		i;
	pgterminal*		op_a;
	pgterminal*		op_s;
	pgterminal*		op_d;
	pgterminal*		op_m;
	pgterminal*		br_op;
	pgterminal*		br_cl;

	pgnonterminal*	start;
	pgnonterminal*	expr;
	pgnonterminal*	term;
	pgnonterminal*	factor;

	pgtoken*		tok;

	g = pg_grammar_create();

	i = pg_terminal_create( g, "@INTEGER", "[0-9]+" );
	op_a = pg_terminal_create( g, "+", "\\+" );
	op_s = pg_terminal_create( g, "-", "-" );
	op_d = pg_terminal_create( g, "/", "/" );
	op_m = pg_terminal_create( g, "*", "\\*" );
	br_op = pg_terminal_create( g, "(", "\\(" );
	br_cl = pg_terminal_create( g, ")", "\\)" );

	start = pg_nonterminal_create( g, "start" );
	expr = pg_nonterminal_create( g, "expr" );
	term = pg_nonterminal_create( g, "term" );
	factor = pg_nonterminal_create( g, "factor" );

	/* start */
	pg_production_create( start, expr, (pgsymbol*)NULL );

	/* expr */
	pg_production_create_as_node( expr, "add", NULL,
		expr, op_a, term, (pgsymbol*)NULL );
	pg_production_create_as_node( expr, "sub", NULL,
		expr, op_s, term, (pgsymbol*)NULL );
	pg_production_create( expr, term, (pgsymbol*)NULL );

	/* term */
	pg_production_create_as_node( term, "mul", NULL,
		term, op_m, factor, (pgsymbol*)NULL );
	pg_production_create_as_node( term, "div", NULL,
		term, op_d, factor, (pgsymbol*)NULL );
	pg_production_create( term, factor, (pgsymbol*)NULL );

	/* factor */
	pg_production_create( factor, br_op, expr, br_cl, (pgsymbol*)NULL );
	pg_production_create( factor, i, (pgsymbol*)NULL );

	pg_grammar_print( g );

	p = pg_parser_create( g, PGPARADIGM_LALR1 );

	pg_lexer_set_source( p->lexer, PG_LEX_SRCTYPE_STRING, "1*2+3" );
	pg_parser_parse( p );

	getchar();
	fprintf( stderr, "------------------------------\n" );

	pg_lexer_set_source( p->lexer, PG_LEX_SRCTYPE_STRING, "(7+3)*2-5" );
	pg_parser_parse( p );

	return 0;
}

#include <phorward.h>

int main()

{

pggrammar* g;

pgparser* p;

pgparser* p2;

pgterminal* i;

pgterminal* op_a;

pgterminal* op_s;

pgterminal* op_d;

pgterminal* op_m;

pgterminal* br_op;

pgterminal* br_cl;

pgnonterminal* start;

pgnonterminal* expr;

pgnonterminal* term;

pgnonterminal* factor;

pgtoken* tok;

g = pg_grammar_create();

i = pg_terminal_create( g, "@INTEGER", "[0-9]+" );

op_a = pg_terminal_create( g, "+", "\\+" );

op_s = pg_terminal_create( g, "-", "-" );

op_d = pg_terminal_create( g, "/", "/" );

op_m = pg_terminal_create( g, "*", "\\*" );

br_op = pg_terminal_create( g, "(", "\\(" );

br_cl = pg_terminal_create( g, ")", "\\)" );

start = pg_nonterminal_create( g, "start" );

expr = pg_nonterminal_create( g, "expr" );

term = pg_nonterminal_create( g, "term" );

factor = pg_nonterminal_create( g, "factor" );

/* start */

pg_production_create( start, expr, (pgsymbol*)NULL );

/* expr */

pg_production_create_as_node( expr, "add", NULL,

expr, op_a, term, (pgsymbol*)NULL );

pg_production_create_as_node( expr, "sub", NULL,

expr, op_s, term, (pgsymbol*)NULL );

pg_production_create( expr, term, (pgsymbol*)NULL );

/* term */

pg_production_create_as_node( term, "mul", NULL,

term, op_m, factor, (pgsymbol*)NULL );

pg_production_create_as_node( term, "div", NULL,

term, op_d, factor, (pgsymbol*)NULL );

pg_production_create( term, factor, (pgsymbol*)NULL );

/* factor */

pg_production_create( factor, br_op, expr, br_cl, (pgsymbol*)NULL );

pg_production_create( factor, i, (pgsymbol*)NULL );

pg_grammar_print( g );

p = pg_parser_create( g, PGPARADIGM_LALR1 );

pg_lexer_set_source( p->lexer, PG_LEX_SRCTYPE_STRING, "1*2+3" );

pg_parser_parse( p );

getchar();

fprintf( stderr, "------------------------------\n" );

pg_lexer_set_source( p->lexer, PG_LEX_SRCTYPE_STRING, "(7+3)*2-5" );

pg_parser_parse( p );

return 0;

}

Erzeugt nun das hier:

*** FINAL STATES***

-- State 0 0x1191330 --
Kernel:
	start : . expr
	-> Shift on '(' to state 3
	<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'
	-> Goto state 1 on 'expr'
	-> Goto state 2 on 'term'
	<- Goto/Reduce by production 'term : factor' in 'factor'
-- State 1 0x11925a0 --
Kernel:
	start : expr .   [ @@eof ]
	expr : expr . @+ term
	expr : expr . @- term
	-> Shift on '+' to state 4
	-> Shift on '-' to state 5
	<- Reduce on '@eof' by production 'start : expr'
-- State 2 0x1192830 --
Kernel:
	expr : term .   [ @- @) @+ @@eof ]
	term : term . @* factor
	term : term . @/ factor
	-> Shift on '*' to state 6
	-> Shift on '/' to state 7
	<- Reduce on '-' by production 'expr : term'
	<- Reduce on ')' by production 'expr : term'
	<- Reduce on '+' by production 'expr : term'
	<- Reduce on '@eof' by production 'expr : term'
-- State 3 0x1192b10 --
Kernel:
	factor : @( . expr @)
	<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'
	-> Goto state 8 on 'expr'
	-> Goto state 2 on 'term'
	<- Goto/Reduce by production 'term : factor' in 'factor'
-- State 4 0x1192e90 --
Kernel:
	expr : expr @+ . term
	-> Shift on '(' to state 3
	<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'
	-> Goto state 9 on 'term'
	<- Goto/Reduce by production 'term : factor' in 'factor'
-- State 5 0x1193080 --
Kernel:
	expr : expr @- . term
	-> Shift on '(' to state 3
	<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'
	-> Goto state 10 on 'term'
	<- Goto/Reduce by production 'term : factor' in 'factor'
-- State 6 0x11936d0 --
Kernel:
	term : term @* . factor
	-> Shift on '(' to state 3
	<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'
	<- Goto/Reduce by production 'term : term @* factor' in 'factor'
-- State 7 0x11938c0 --
Kernel:
	term : term @/ . factor
	-> Shift on '(' to state 3
	<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'
	<- Goto/Reduce by production 'term : term @/ factor' in 'factor'
-- State 8 0x1194640 --
Kernel:
	factor : @( expr . @)
	expr : expr . @+ term
	expr : expr . @- term
	<- Shift/Reduce on ')' by production 'factor : @( expr @)'
	-> Shift on '+' to state 4
	-> Shift on '-' to state 5
-- State 9 0x11956f0 --
Kernel:
	expr : expr @+ term .   [ @- @) @+ @@eof ]
	term : term . @* factor
	term : term . @/ factor
	-> Shift on '*' to state 6
	-> Shift on '/' to state 7
	<- Reduce on '-' by production 'expr : expr @+ term'
	<- Reduce on ')' by production 'expr : expr @+ term'
	<- Reduce on '+' by production 'expr : expr @+ term'
	<- Reduce on '@eof' by production 'expr : expr @+ term'
-- State 10 0x1196080 --
Kernel:
	expr : expr @- term .   [ @- @) @+ @@eof ]
	term : term . @* factor
	term : term . @/ factor
	-> Shift on '*' to state 6
	-> Shift on '/' to state 7
	<- Reduce on '-' by production 'expr : expr @- term'
	<- Reduce on ')' by production 'expr : expr @- term'
	<- Reduce on '+' by production 'expr : expr @- term'
	<- Reduce on '@eof' by production 'expr : expr @- term'
#00: columns=24 accept=-1 default=-1 trans=47(/);47(/):01 trans=45(-);45(-):02 trans=43(+);43(+):03 trans=41());41()):04 trans=48(0);57(9):05 trans=40(();40(():06 trans=42(*);42(*):07
#01: columns=3 accept=5 default=-1
#02: columns=3 accept=4 default=-1
#03: columns=3 accept=3 default=-1
#04: columns=3 accept=8 default=-1
#05: columns=6 accept=2 default=-1 trans=48(0);57(9):05
#06: columns=3 accept=7 default=-1
#07: columns=3 accept=6 default=-1
>>>
00: sym: '(X)' state: 0 token: (X)((X))
get token
Token 2 len 1 lexem >1<
got token '@INTEGER' lexem '1'
shift/reduce by production 8
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '@INTEGER' state: -1 token: @INTEGER(1)
reduce by production 8
popping 1 items off the stack, replacing by 'factor'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'factor' state: -1 token: (X)((X))
reduce by production 6
popping 1 items off the stack, replacing by 'term'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
get token
Token 6 len 1 lexem >*<
got token '*' lexem '*'
shift to state 6
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
get token
Token 2 len 1 lexem >2<
got token '@INTEGER' lexem '2'
shift/reduce by production 8
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
03: sym: '@INTEGER' state: -1 token: @INTEGER(2)
reduce by production 8
popping 1 items off the stack, replacing by 'factor'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
03: sym: 'factor' state: -1 token: (X)((X))
reduce by production 4
popping 3 items off the stack, replacing by 'term'
<<<
<<<
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
get token
Token 3 len 1 lexem >+<
got token '+' lexem '+'
reduce by production 3
popping 1 items off the stack, replacing by 'expr'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
get token
got token '+' lexem '+'
shift to state 4
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '+' state: 4 token: +(+)
get token
Token 2 len 1 lexem >3<
got token '@INTEGER' lexem '3'
shift/reduce by production 8
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '+' state: 4 token: +(+)
03: sym: '@INTEGER' state: -1 token: @INTEGER(3)
reduce by production 8
popping 1 items off the stack, replacing by 'factor'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '+' state: 4 token: +(+)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '+' state: 4 token: +(+)
03: sym: 'factor' state: -1 token: (X)((X))
reduce by production 6
popping 1 items off the stack, replacing by 'term'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '+' state: 4 token: +(+)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '+' state: 4 token: +(+)
03: sym: 'term' state: 9 token: (X)((X))
get token
EOF read
got token '@eof' lexem ''
reduce by production 1
popping 3 items off the stack, replacing by 'expr'
<<<
<<<
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
get token
got token '@eof' lexem ''
reduce by production 0
popping 1 items off the stack, replacing by 'start'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
goal symbol reduced!
add
 mul
  @INTEGER = >1<
  @INTEGER = >2<
 @INTEGER = >3<
------------------------------
>>>
00: sym: '(X)' state: 0 token: (X)((X))
get token
Token 7 len 1 lexem >(<
got token '(' lexem '('
shift to state 3
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
get token
Token 2 len 1 lexem >7<
got token '@INTEGER' lexem '7'
shift/reduce by production 8
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: '@INTEGER' state: -1 token: @INTEGER(7)
reduce by production 8
popping 1 items off the stack, replacing by 'factor'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'factor' state: -1 token: (X)((X))
reduce by production 6
popping 1 items off the stack, replacing by 'term'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'term' state: 2 token: (X)((X))
get token
Token 3 len 1 lexem >+<
got token '+' lexem '+'
reduce by production 3
popping 1 items off the stack, replacing by 'expr'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
get token
got token '+' lexem '+'
shift to state 4
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
03: sym: '+' state: 4 token: +(+)
get token
Token 2 len 1 lexem >3<
got token '@INTEGER' lexem '3'
shift/reduce by production 8
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
03: sym: '+' state: 4 token: +(+)
04: sym: '@INTEGER' state: -1 token: @INTEGER(3)
reduce by production 8
popping 1 items off the stack, replacing by 'factor'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
03: sym: '+' state: 4 token: +(+)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
03: sym: '+' state: 4 token: +(+)
04: sym: 'factor' state: -1 token: (X)((X))
reduce by production 6
popping 1 items off the stack, replacing by 'term'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
03: sym: '+' state: 4 token: +(+)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
03: sym: '+' state: 4 token: +(+)
04: sym: 'term' state: 9 token: (X)((X))
get token
Token 8 len 1 lexem >)<
got token ')' lexem ')'
reduce by production 1
popping 3 items off the stack, replacing by 'expr'
<<<
<<<
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
get token
got token ')' lexem ')'
shift/reduce by production 7
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: '(' state: 3 token: ((()
02: sym: 'expr' state: 8 token: (X)((X))
03: sym: ')' state: -1 token: )())
reduce by production 7
popping 3 items off the stack, replacing by 'factor'
<<<
<<<
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'factor' state: -1 token: (X)((X))
reduce by production 6
popping 1 items off the stack, replacing by 'term'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
get token
Token 6 len 1 lexem >*<
got token '*' lexem '*'
shift to state 6
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
get token
Token 2 len 1 lexem >2<
got token '@INTEGER' lexem '2'
shift/reduce by production 8
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
03: sym: '@INTEGER' state: -1 token: @INTEGER(2)
reduce by production 8
popping 1 items off the stack, replacing by 'factor'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
02: sym: '*' state: 6 token: *(*)
03: sym: 'factor' state: -1 token: (X)((X))
reduce by production 4
popping 3 items off the stack, replacing by 'term'
<<<
<<<
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'term' state: 2 token: (X)((X))
get token
Token 4 len 1 lexem >-<
got token '-' lexem '-'
reduce by production 3
popping 1 items off the stack, replacing by 'expr'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
get token
got token '-' lexem '-'
shift to state 5
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '-' state: 5 token: -(-)
get token
Token 2 len 1 lexem >5<
got token '@INTEGER' lexem '5'
shift/reduce by production 8
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '-' state: 5 token: -(-)
03: sym: '@INTEGER' state: -1 token: @INTEGER(5)
reduce by production 8
popping 1 items off the stack, replacing by 'factor'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '-' state: 5 token: -(-)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '-' state: 5 token: -(-)
03: sym: 'factor' state: -1 token: (X)((X))
reduce by production 6
popping 1 items off the stack, replacing by 'term'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '-' state: 5 token: -(-)
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
02: sym: '-' state: 5 token: -(-)
03: sym: 'term' state: 10 token: (X)((X))
get token
EOF read
got token '@eof' lexem ''
reduce by production 2
popping 3 items off the stack, replacing by 'expr'
<<<
<<<
<<<
00: sym: '(X)' state: 0 token: (X)((X))
>>>
00: sym: '(X)' state: 0 token: (X)((X))
01: sym: 'expr' state: 1 token: (X)((X))
get token
got token '@eof' lexem ''
reduce by production 0
popping 1 items off the stack, replacing by 'start'
<<<
00: sym: '(X)' state: 0 token: (X)((X))
goal symbol reduced!
sub
 mul
  add
   @INTEGER = >7<
   @INTEGER = >3<
  @INTEGER = >2<
 @INTEGER = >5<

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

*** FINAL STATES***

-- State 0 0x1191330 --

Kernel:

start : . expr

-> Shift on '(' to state 3

<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'

-> Goto state 1 on 'expr'

-> Goto state 2 on 'term'

<- Goto/Reduce by production 'term : factor' in 'factor'

-- State 1 0x11925a0 --

Kernel:

start : expr . [ @@eof ]

expr : expr . @+ term

expr : expr . @- term

-> Shift on '+' to state 4

-> Shift on '-' to state 5

<- Reduce on '@eof' by production 'start : expr'

-- State 2 0x1192830 --

Kernel:

expr : term . [ @- @) @+ @@eof ]

term : term . @* factor

term : term . @/ factor

-> Shift on '*' to state 6

-> Shift on '/' to state 7

<- Reduce on '-' by production 'expr : term'

<- Reduce on ')' by production 'expr : term'

<- Reduce on '+' by production 'expr : term'

<- Reduce on '@eof' by production 'expr : term'

-- State 3 0x1192b10 --

Kernel:

factor : @( . expr @)

<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'

-> Goto state 8 on 'expr'

-> Goto state 2 on 'term'

<- Goto/Reduce by production 'term : factor' in 'factor'

-- State 4 0x1192e90 --

Kernel:

expr : expr @+ . term

-> Shift on '(' to state 3

<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'

-> Goto state 9 on 'term'

<- Goto/Reduce by production 'term : factor' in 'factor'

-- State 5 0x1193080 --

Kernel:

expr : expr @- . term

-> Shift on '(' to state 3

<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'

-> Goto state 10 on 'term'

<- Goto/Reduce by production 'term : factor' in 'factor'

-- State 6 0x11936d0 --

Kernel:

term : term @* . factor

-> Shift on '(' to state 3

<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'

<- Goto/Reduce by production 'term : term @* factor' in 'factor'

-- State 7 0x11938c0 --

Kernel:

term : term @/ . factor

-> Shift on '(' to state 3

<- Shift/Reduce on '@INTEGER' by production 'factor : @@INTEGER'

<- Goto/Reduce by production 'term : term @/ factor' in 'factor'

-- State 8 0x1194640 --

Kernel:

factor : @( expr . @)

expr : expr . @+ term

expr : expr . @- term

<- Shift/Reduce on ')' by production 'factor : @( expr @)'

-> Shift on '+' to state 4

-> Shift on '-' to state 5

-- State 9 0x11956f0 --

Kernel:

expr : expr @+ term . [ @- @) @+ @@eof ]

term : term . @* factor

term : term . @/ factor

-> Shift on '*' to state 6

-> Shift on '/' to state 7

<- Reduce on '-' by production 'expr : expr @+ term'

<- Reduce on ')' by production 'expr : expr @+ term'

<- Reduce on '+' by production 'expr : expr @+ term'

<- Reduce on '@eof' by production 'expr : expr @+ term'

-- State 10 0x1196080 --

Kernel:

expr : expr @- term . [ @- @) @+ @@eof ]

term : term . @* factor

term : term . @/ factor

-> Shift on '*' to state 6

-> Shift on '/' to state 7

<- Reduce on '-' by production 'expr : expr @- term'

<- Reduce on ')' by production 'expr : expr @- term'

<- Reduce on '+' by production 'expr : expr @- term'

<- Reduce on '@eof' by production 'expr : expr @- term'

#00: columns=24 accept=-1 default=-1 trans=47(/);47(/):01 trans=45(-);45(-):02 trans=43(+);43(+):03 trans=41());41()):04 trans=48(0);57(9):05 trans=40(();40(():06 trans=42(*);42(*):07

#01: columns=3 accept=5 default=-1

#02: columns=3 accept=4 default=-1

#03: columns=3 accept=3 default=-1

#04: columns=3 accept=8 default=-1

#05: columns=6 accept=2 default=-1 trans=48(0);57(9):05

#06: columns=3 accept=7 default=-1

#07: columns=3 accept=6 default=-1

>>>

00: sym: '(X)' state: 0 token: (X)((X))

get token

Token 2 len 1 lexem >1<

got token '@INTEGER' lexem '1'

shift/reduce by production 8

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '@INTEGER' state: -1 token: @INTEGER(1)

reduce by production 8

popping 1 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'factor' state: -1 token: (X)((X))

reduce by production 6

popping 1 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

get token

Token 6 len 1 lexem >*<

got token '*' lexem '*'

shift to state 6

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

get token

Token 2 len 1 lexem >2<

got token '@INTEGER' lexem '2'

shift/reduce by production 8

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

03: sym: '@INTEGER' state: -1 token: @INTEGER(2)

reduce by production 8

popping 1 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

03: sym: 'factor' state: -1 token: (X)((X))

reduce by production 4

popping 3 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

get token

Token 3 len 1 lexem >+<

got token '+' lexem '+'

reduce by production 3

popping 1 items off the stack, replacing by 'expr'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

get token

got token '+' lexem '+'

shift to state 4

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '+' state: 4 token: +(+)

get token

Token 2 len 1 lexem >3<

got token '@INTEGER' lexem '3'

shift/reduce by production 8

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '+' state: 4 token: +(+)

03: sym: '@INTEGER' state: -1 token: @INTEGER(3)

reduce by production 8

popping 1 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '+' state: 4 token: +(+)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '+' state: 4 token: +(+)

03: sym: 'factor' state: -1 token: (X)((X))

reduce by production 6

popping 1 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '+' state: 4 token: +(+)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '+' state: 4 token: +(+)

03: sym: 'term' state: 9 token: (X)((X))

get token

EOF read

got token '@eof' lexem ''

reduce by production 1

popping 3 items off the stack, replacing by 'expr'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

get token

got token '@eof' lexem ''

reduce by production 0

popping 1 items off the stack, replacing by 'start'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

goal symbol reduced!

add

mul

@INTEGER = >1<

@INTEGER = >2<

@INTEGER = >3<

------------------------------

>>>

00: sym: '(X)' state: 0 token: (X)((X))

get token

Token 7 len 1 lexem >(<

got token '(' lexem '('

shift to state 3

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

get token

Token 2 len 1 lexem >7<

got token '@INTEGER' lexem '7'

shift/reduce by production 8

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: '@INTEGER' state: -1 token: @INTEGER(7)

reduce by production 8

popping 1 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'factor' state: -1 token: (X)((X))

reduce by production 6

popping 1 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'term' state: 2 token: (X)((X))

get token

Token 3 len 1 lexem >+<

got token '+' lexem '+'

reduce by production 3

popping 1 items off the stack, replacing by 'expr'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

get token

got token '+' lexem '+'

shift to state 4

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

03: sym: '+' state: 4 token: +(+)

get token

Token 2 len 1 lexem >3<

got token '@INTEGER' lexem '3'

shift/reduce by production 8

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

03: sym: '+' state: 4 token: +(+)

04: sym: '@INTEGER' state: -1 token: @INTEGER(3)

reduce by production 8

popping 1 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

03: sym: '+' state: 4 token: +(+)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

03: sym: '+' state: 4 token: +(+)

04: sym: 'factor' state: -1 token: (X)((X))

reduce by production 6

popping 1 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

03: sym: '+' state: 4 token: +(+)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

03: sym: '+' state: 4 token: +(+)

04: sym: 'term' state: 9 token: (X)((X))

get token

Token 8 len 1 lexem >)<

got token ')' lexem ')'

reduce by production 1

popping 3 items off the stack, replacing by 'expr'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

get token

got token ')' lexem ')'

shift/reduce by production 7

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: '(' state: 3 token: ((()

02: sym: 'expr' state: 8 token: (X)((X))

03: sym: ')' state: -1 token: )())

reduce by production 7

popping 3 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'factor' state: -1 token: (X)((X))

reduce by production 6

popping 1 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

get token

Token 6 len 1 lexem >*<

got token '*' lexem '*'

shift to state 6

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

get token

Token 2 len 1 lexem >2<

got token '@INTEGER' lexem '2'

shift/reduce by production 8

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

03: sym: '@INTEGER' state: -1 token: @INTEGER(2)

reduce by production 8

popping 1 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

02: sym: '*' state: 6 token: *(*)

03: sym: 'factor' state: -1 token: (X)((X))

reduce by production 4

popping 3 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'term' state: 2 token: (X)((X))

get token

Token 4 len 1 lexem >-<

got token '-' lexem '-'

reduce by production 3

popping 1 items off the stack, replacing by 'expr'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

get token

got token '-' lexem '-'

shift to state 5

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '-' state: 5 token: -(-)

get token

Token 2 len 1 lexem >5<

got token '@INTEGER' lexem '5'

shift/reduce by production 8

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '-' state: 5 token: -(-)

03: sym: '@INTEGER' state: -1 token: @INTEGER(5)

reduce by production 8

popping 1 items off the stack, replacing by 'factor'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '-' state: 5 token: -(-)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '-' state: 5 token: -(-)

03: sym: 'factor' state: -1 token: (X)((X))

reduce by production 6

popping 1 items off the stack, replacing by 'term'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '-' state: 5 token: -(-)

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

02: sym: '-' state: 5 token: -(-)

03: sym: 'term' state: 10 token: (X)((X))

get token

EOF read

got token '@eof' lexem ''

reduce by production 2

popping 3 items off the stack, replacing by 'expr'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

>>>

00: sym: '(X)' state: 0 token: (X)((X))

01: sym: 'expr' state: 1 token: (X)((X))

get token

got token '@eof' lexem ''

reduce by production 0

popping 1 items off the stack, replacing by 'start'

<<<

00: sym: '(X)' state: 0 token: (X)((X))

goal symbol reduced!

sub

mul

add

@INTEGER = >7<

@INTEGER = >3<

@INTEGER = >2<

@INTEGER = >5<

Man beachte das, was vor der Linie (Zeile 233) kommt, und das was hier am Ende steht (Zeile 467): Ein abstrakter Syntaxbaum! Hier am Ende: AST der Eingabe (7+3)*2-5 ist sub( mul( add( 7, 3 ), 2 ), 5 ).

Ok klingt vielleicht für die meisten nicht so Spannend, aber es ist für mich nur.. geile Scheiße!!! 😀

Etwas planlos…

Posted on January 14, 2014 by bierpilot

pglexer erzeugt nun pgtoken. Schön, hm?

#include <phorward.h>

int main()
{
    pggrammar*       g;
    pgparser*        p;

    pgnonterminal*   start;

    pgtoken*         tok;

    pgterminal*      test;
    pgterminal*      test2;

    g = pg_grammar_create();
    test = pg_terminal_create( g, "INTEGER", "[0-9]+" );
    test2 = pg_terminal_create( g, "NAME", "[A-Za-z_!]+" );
    start = pg_nonterminal_create( g, "start" );
    pg_production_create( start, test, test2, (pgsymbol*)NULL );
    pg_grammar_print( g );

    p = pg_parser_create( g, PGPARADIGM_LALR1 );

    p->lexer->flags = PG_LEXMOD_NONE;
    pg_lexer_set_source( p->lexer, PG_LEX_SRCTYPE_STRING,
        "Die Welt is voller Bier 1337 so_nimm_es_dir!" );

    while( ( tok = pg_lexer_fetch( p->lexer ) ) )
        pg_token_print( tok );

    return 0;
}

#include <phorward.h>

int main()

{

pggrammar* g;

pgparser* p;

pgnonterminal* start;

pgtoken* tok;

pgterminal* test;

pgterminal* test2;

g = pg_grammar_create();

test = pg_terminal_create( g, "INTEGER", "[0-9]+" );

test2 = pg_terminal_create( g, "NAME", "[A-Za-z_!]+" );

start = pg_nonterminal_create( g, "start" );

pg_production_create( start, test, test2, (pgsymbol*)NULL );

pg_grammar_print( g );

p = pg_parser_create( g, PGPARADIGM_LALR1 );

p->lexer->flags = PG_LEXMOD_NONE;

pg_lexer_set_source( p->lexer, PG_LEX_SRCTYPE_STRING,

"Die Welt is voller Bier 1337 so_nimm_es_dir!" );

while( ( tok = pg_lexer_fetch( p->lexer ) ) )

pg_token_print( tok );

return 0;

}

Jedenfalls funktioniert das Ergebnis gut bisher. Es ist möglich, aus verschiedenen Datenquellen (“sources”) zu lesen, z.B. aus einem Dateistrom, String, wide-character String oder über eine Funktion, z.B. getchar(). pglexer übernimmt dabei automatisch das Buffering.

Aktueller Stand (so wie hier auf SourceForge).

Naja soooo viel hat sich im Grunde nicht getan in der libphorward.
Was aber nun schon mal geht ist:

Grammatiken über API-Funktionen definieren
Parse-Tabellen für LR(0), LR(1) und LALR(1) mit table compression werden erzeugt
Lexer wie oben beschrieben
Regex-Library sehr stark verbessert: benutzt jetzt kein llist mehr, nurnoch plist :-), und alle bisher aufgetretenen Bugs sind gefixt.

Todo:

pgparser auf einen Stand bringen der erstmal nur UTF-8 unterstützt aber läuft mit dem Ziel, einen Parser direkt über die libphorward zu definieren
Rückgabe des Parsetrees als pgast-Struktur, später TBNF-basierte Konstruktion eines AST (abstract syntax tree).
Funktion in pggrammar implementieren, die eine Grammatik über sich selbst parst und zurückgibt (pg_grammar_parse()).

UniCC wird auch noch auf plist umgestellt bzw. auf die neue Funktion pregex_dfa_to_matrix() umgeschrieben. Zur Zeit lässt sich der UniCC nicht mit der aktuellen libphorward 0.18 linken, auch wenn er bereits schon auf einige 0.18-spezifische Neuerunge umgebaut worden ist.

Sortieralgorithmen

Posted on October 30, 2013 by bierpilot

Mal ein interessantes wie auch witziges Video, welches verschiedene Sortieralgorithmen zeigt (Danke an Andi für’s finden!).

So much to code…

Posted on September 26, 2013 by bierpilot

…so little time!

Wie wahr doch dieser Spruch ist!

Momentan bin ich wieder ziemlich dabei, Phorward Software weiter zu bringen. Was mich ein wenig nervt ist (wieder mal) mein Perfektionsdrang, der sich zur Zeit in einer radikalen Änderung der libphorward wiederspiegelt, wie auch in der Tatsache, dass der UniCC Parser Generator, ein Projekt an dem ich 6 Jahre (!) sporadisch gearbeitet habe, sozusagen für die Katz war. Das Programm ist Feature complete, und ich sehe darin auch keine große Zukunft – weil es nicht flexibel genug ist. Letztendlich. Hm.

Nein, die Zukunft liegt momentan eher in der libphorward, und der Erweiterung die ich vor einiger Zeit mal vorgestellt hatte als pggrammar. Inzwischen ist aus diesem Anfang schon eine beachtliche Menge Code geworden. pggrammar, oder das, was daraus wird, wird eine Spielwiese für Grammatiken, Lexer und Parser, also das, was UniCC als Code Generator verkörpert, nur in Form einer Library. Das schöne dabei ist, dass man bei einer Library – basierend auf einem objektorientiertem Ansatz – eine extrem flexible Software machen kann, die letztendlich keine Wünsche mehr offen lässt. Zumindest denke ich das.

Zur Zeit versuche ich das, was in diesem Klassendiagramm zu finden ist, in die libphorward einzubauen.

Das ganze versuche ich natürlich wieder in C zu programmieren – wo auch schon das nächste Problem liegt. Eigentlich würde C++ hier mehr Sinn machen, zumal ich C++ auch gerne mal wirklich lernen würde. Aber dann kann man es nicht mehr in C benutzen… was mich doch wieder dazu veranlasst, es nicht zu tun, und dafür ein wirrwarr an Strukturen und komisch benannten Funktionen aufzubauen… verdammter Perfektionsdrang. Naja, mal sehen. Es macht ja eigentlich gar keinen Sinn, eine C-Library, in die man echt viel Zeit und Arbeit gesteckt hat, jetzt “einfach mal” auf C++ umzuschreiben – zumal dann ein Großteil der Library Funktionen wieder überflüssig wird, denke man nur an die Funktionen für verkettete Listen, Hash-Tables, dynamische Array… sowie das neue Objekt plist, welches sowohl doppelt verkettete Liste, Array als auch Hashtable sein kann. Krank, nicht?? 🙁

Ursprünglich war geplant, das Regular Expression Modul der libphorward nun darauf umzustellen, dass man auch direkt auf FILE-streams arbeiten kann…aber das wäre wieder nicht flexibel genug. Nach reichlicher Überlegung habe ich mich daher nun dazu entschlossen, anstatt das regex-Modul wieder komplett umzukrempeln (und ich finde gerade das ist mir bisher ziemlich gelungen!) nun die pggrammar-Idee in drei Module der libphorward aufzuteilen:

grammar (pgrammar)
- pgrammar
- pterminal
- pnonterminal
- pproduction
lexer (plexer)
- plexer
parser (pparser)
- pparser

Die Module grammar und parser (was den LR/LALR-Teil angeht) sind momentan ja schon auf einem guten Wege, nur zur Zeit noch vereint als Modul “parser” in der libphorward. Das Modul lexer würde dann die Schnittstelle zwischen dem Modul regex und dem parser aufbauen, wäre aber auch ohne beide Module lauffähig.

Ein Modul der libphorward wird immer anhand des Verzeichnisses in src definiert. Die Funktionen sind alle in einer Library, aber bestimmten Themen zugeordnet.

Die libphorward würde dann aus folgenden Bereichen bestehen:

base (Basis-Funktionen, Datenstrukturen)
- debug
- llist** (leider geil, da einfach zu bedienen! – sehr häufig benutzt)
- hashtab*
- stack*
- plist (hash-table, double-linked list, stack als einzelnes Objekt)
string (erweiterte String Funktionen)
regex (Funktionen für reguläre Ausdrücke, NFA/DFA, Zeichenklassen)
union* (dynamische Datentypstruktur)
xml* (XML-DOM Tools)
util* (System-Werkzeuge)
grammar*** (Grammatik-Tools)
lexer*** (Tools zur Erstellung lexikalischer Analysatoren)
parser*** (Tools zur Erstellung von Parsern (LR, LL) auf Basis von grammar und lexer)

* Modul/Datenstruktur nicht mehr sinnvoll?
** Die Datenstruktur LIST, auch LLIST oder llist genannt, ist ein Phänomen. Die Funktionen dafür habe ich mal anno 2006 oder so programmiert – und diese Library ist unschlagbar geil, weil sie so simpel ist. Einfach verkettete Pointer-Listen ohne viel Schnickschnack: LIST. Selbst in pggrammar habe ich viele davon benutzt, weil sie so extrem simpel ist. Daher wird plist “nur” in Fällen genutzt, wo es wirklich Sinn macht – also alles, was hashtab, llist und/oder array sein soll und muss. Wegoptimieren von LIST? Unmöglich. Aber LIST ist cool!:
*** Modul in Planung!

So sieht’s momentan aus. Tja… viel Gedankenmüll. Und das um sinnlose Software. Aber: Ich find’s geil! 😀 Und ist es nicht das, worauf es ankommt?

String mit Escape-Sequenzen via Regular Expression matchen

Posted on August 27, 2013 by bierpilot

Noch ein kurzer Tipp an mich selbst, nach heute wieder überflüssigem rumgesuche und dem bösen Gedanken, die Phorward Foundation Library könnte hier einen Fehler haben.

regex = pregex_create();
pregex_set_flags( regex, PREGEX_MOD_GLOBAL | PREGEX_MOD_NONGREEDY );
pregex_compile( regex, "'(\\\\.|[^\\\\\n'])*'", 0 );

if( ( match_cnt = pregex_match( regex, "A'\\''B'",
            &matches ) ) > 0 )
{
    for( i = 0; i < match_cnt; i++ )
    {
        printf( "%d >%.*s<\n", matches[i].accept,
                                matches[i].len, matches[i].begin );
    }
}

regex = pregex_free( regex );

regex = pregex_create();

pregex_set_flags( regex, PREGEX_MOD_GLOBAL | PREGEX_MOD_NONGREEDY );

pregex_compile( regex, "'(\\\\.|[^\\\\\n'])*'", 0 );

if( ( match_cnt = pregex_match( regex, "A'\\''B'",

&matches ) ) > 0 )

{

for( i = 0; i < match_cnt; i++ )

{

printf( "%d >%.*s<\n", matches[i].accept,

matches[i].len, matches[i].begin );

}

regex = pregex_free( regex );

http://stackoverflow.com/questions/4166194/how-do-i-write-a-non-greedy-match-in-lex-flex

awk Reference Card

Posted on August 27, 2013 by bierpilot

Hallöchen!

Letzte Woche hab ich mich auf der Arbeit mal ein eine kompakte awk Reference Card zum ausdrucken und auf den Schreibtisch legen gebastelt. Sie basiert auf der Referenz von hier, wurde aber eben auf das A4-Format zurechtgestutzt. Das tolle ist, man muss die Karte nicht mal rumdrehen, alles steht auf einer Seite! 🙂

Zur freien Verwendung.

Neue Vereinswebsite fertig!

Link

Endlich, seit heute morgen um 0 Uhr ist die neue Vereinswebsite fertig! 🙂

Bierpiloten-Blog

Vom Fliegen, Coden und sonstigem Kram…

Category Archives: Coding / Programmieren

Eine linksrekursive Grammatik im rekursiven Abstieg parsen

Emscripten

nerdfroi

libphorward kann ASTs

Etwas planlos…

Sortieralgorithmen

So much to code…

String mit Escape-Sequenzen via Regular Expression matchen

awk Reference Card

Neue Vereinswebsite fertig!

Link