reuseLogostLogotudLogorewerseLogo modelplexLogo
gearsBG
Reuseware Composition Framework
Components, Modules, Aspects or something new?
Introduce new Composition Techniques into your Language of Choice with Reuseware!
TIP
This site refers to a modified version of the Reuseware Composition Framework 0.5.1.

Contents

Modular Syntax Definitions

In the following we glance at the main parts of the compositon system. First the main parts of the 'syntaxdefintionmodel' will be discussed. Second, the model is extended with approtiate slots and composer calls. In the last part concrete syntax mappings for JavaCC and SableCC are presented.

A Simple Generic Model For Parser-Specifications

The model consists of several parts. Cardinalities occur in EBNF and Regular expression models which themselves are reused in the actual syntax definition model. Note that this form of notation is used in Reuseware to define metamodels in a grammar like fashion:

Choice-Rules (|) are used to declare alternative implementations of an interface. The following example means A is an abstract class with subclasses A1..A3.

A = A1|A2|A3

Definitions specify the members of a concrete class. This may be primitive typed attributes (e.g. S means String) or composites (complex objects like instances of A from above). The following example defines the non-abstract type A1 to have a name of type String which is optional and thus can be null in instances. A1 objects also have at least one or more children of type Other.

A1 = name:S?, children:Other+;

A third kind of productions can be used to specify inheritance accross models. In the following the class Other inherits from SpecialConstruct in the model othermodel.

Other ==> othermodel.SpecialConstruct;

Cardinality Model

Cardinality = OptionalRepetition | Repetition | Optional;
OptionalRepetition;
Repetition;
Optional;

EBNF Model

AbstractBNFExpression = BNFExpression;
BNFExpression = arguments:BNFConcat*;
BNFConcat = arguments:BNFFactor*; 
BNFFactor = atom:BNFAtom , cardinality:cardinalities.Cardinality?;
BNFAtom = GrammarSymbol | BNFSubExpression;
BNFSubExpression = nested:AbstractBNFExpression;
GrammarSymbol = AbstractTerminal | AbstractNonTerminal;
AbstractTerminal;
AbstractNonTerminal;

Regular Expression Model

AbstractREGExpression = REGExpression;
REGExpression = arguments:REGConcat*;
REGConcat = arguments:REGFactor*;
REGFactor = atom:REGAtom, cardinality:cardinalities.Cardinality;
REGSubExpression = nested:AbstractREGExpression;
REGAtom = REGSubExpression | CharacterSet | CharacterSequence;
CharacterSet = CharacterEnumeration | RangeEnumeration;
CharacterEnumeration = option:Invert?,enumeration:AtomicCharacter*;
RangeEnumeration = enumeration:CharacterRange+;
CharacterRange = min:AtomicCharacter, max:AtomicCharacter; 
CharacterSequence = AtomicCharacter | AtomicLiteral;
AtomicCharacter = character:S;
AtomicLiteral = characters:S;
QuantifiedRepetition = min:NaturalNumber, max:NaturalNumber;
QuantifiedRepetition ==> cardinalities.Cardinality;
NaturalNumber = value:S;
Invert;

Syntax Definition Model

Syntax definitions mainly consist of two parts. Lexer productions contain the defined language's tokens. Each token is defined by one lexer rule and can be of type regular (passed to the parser) or ignore (not passed to the parser). Each lexer production describes one lexer state. Lexer states are typically used to seperate token sets of different parts of a language. Grammar productions define the language's context-free syntax. Each production consists of a nonterminal on the left side and an EBNF expression on the right.

AbstractSyntaxDefinition = SyntaxDefinition;
SyntaxDefinition = lexerproductions:LexerProductionList,
grammarproductions:GrammarProductionList;
GrammarProductionList = productions:AbstractGrammarProduction*;
LexerProductionList = productions:AbstractLexerProduction*;

Production = AbstractGrammarProduction | AbstractLexerProduction;
AbstractLexerProduction = LexerProduction;
LexerProduction = productiontype:ProductionType,state:AbstractLexerState?,
definitions:LexerRuleList;
ProductionType = RegularType | IgnoreType;
RegularType;
IgnoreType;
LexerRuleList = rules:AbstractLexerRule*;
AbstractLexerRule = LexerRule;
LexerRule = terminal:TokenName, followstate:AbstractLexerState?,
regex:regularexpression.AbstractREGExpression;
AbstractLexerState = LexerState | DefaultLexerState;
LexerState = name:S;
DefaultLexerState;
AbstractGrammarProduction = GrammarProduction;
GrammarProduction = head:bnfexpression.AbstractNonTerminal,
bnfexp:bnfexpression.AbstractBNFExpression;
NonTerminalSymbol ==> bnfexpression.AbstractNonTerminal;
NonTerminalSymbol = name:S;
TokenReference ==> bnfexpression.AbstractTerminal;
TokenReference = tokenname:TokenName;
TokenName = name:S;

Composition Language (and Component Model)

Composition systems usually contain a composition language to write composition programs. By default, Reuseware 0.5.x provides a generic language to write composition programs and composers in general. However genericity not allways is the best way to realise appropriate reuse abstractions for a language. To provide the right reuse abstractions for syntax definitions a specific reuse language was created and embedded into the model presented above. This enables language designers to use concepts from modules immediately in language specifications.


The Extended Metamodel

The original metamodel is extended such that additional constructs may occur at an instance's root. A ReuseSyntaxDefinition holds a nested SyntaxDefinition and a list of DeclarationStatements and a call to the Adapt composer which is explained later.

syntaxdefinitionmodel.AbstractSyntaxDefinition = ReuseSyntaxDefinition;
ReuseSyntaxDefinition = 
  declarations:DeclarationStatement*,
  nested:syntaxdefinitionmodel.SyntaxDefinition, 
  super:Adapt?;
DeclarationStatement = ModuleDeclaration | Parametrize | Refactor;

Module Declaration

Module declarations make components available for reuse by name.

ModuleDeclaration ==> componentmodel.NamedFragmentList;
ModuleDeclaration;

Parametrization

This composer allows to bind values to slots (parametrization) in declared modules. A template parameter is a mapping of varaitionpoints to fragment references (referenced modules have to be declared) or concrete values to be bound.

Parametrize ==> minimalcl.Composer;
Parametrize =  
  modulename:componentmodel.FragmentName,
  parameters:TemplateParameter*;
TemplateParameter = 
  variationPoint:componentmodel.VariationPointName,
  reference:componentmodel.FragmentReference;

Refactorings

The refactor composer allows to consitently rename or remove grammar symbols. It is also possible to rename lexer states.

Refactor ==> minimalcl.Composer;
Refactor =
  modulename:componentmodel.FragmentName,
  parameters:RefactorParameter*;
RefactorParameter = 
  RenameSymbolParameter | RemoveSymbolParameter | RenameLexerStateParameter;
RenameSymbolParameter =  RenameTerminalParameter | RenameNonTerminalParameter;
RenameTerminalParameter = 
  target:bnfexpression.AbstractTerminal,
  replacement:bnfexpression.AbstractTerminal;
RenameNonTerminalParameter = 
  target:bnfexpression.AbstractNonTerminal,
  replacement:bnfexpression.AbstractNonTerminal;
RenameLexerStateParameter = 
  target:syntaxdefinitionmodel.AbstractLexerState,
  replacement:syntaxdefinitionmodel.AbstractLexerState;
RemoveSymbolParameter = target:bnfexpression.GrammarSymbol;

Import

Import can be used to import single constructs from a declared module and bind them in place of the import composer call.

ParametrizedImport ==> minimalcl.Composer;
ParametrizedImport =
  modulename:componentmodel.FragmentName,
  parameters:TemplateParameter*;

Grammar Inheritance with Adapt

Adapt allows to adapt and extend existing syntax definitions based on different types of merge rules which can be annotated to grammar productions and lexer rules.

Adapt ==> minimalcl.Composer;
Adapt = modulename:componentmodel.FragmentName,parameters:TemplateParameter*;
syntaxdefinitionmodel.AbstractGrammarProduction = 
  ReuseGrammarProduction | MergeGrammarProduction;
syntaxdefinitionmodel.AbstractLexerRule = MergeLexerRule | ParametrizedImport;
ReuseGrammarProduction = 
  StandardGrammarProduction | ParametrizedImport | EmbedProduction; 
StandardGrammarProduction ==> syntaxdefinitionmodel.GrammarProduction;
StandardGrammarProduction;
MergeGrammarProduction =  
  type:GrammarMergeType,
  production:ReuseGrammarProduction;
MergeLexerRule = type:LexicalMergeType, rule:syntaxdefinitionmodel.LexerRule;
GrammarMergeType = Replace | Refine | Remove;
LexicalMergeType = Replace| Refine | Before | Remove | First;
Replace;
Refine;
Remove;
Before = terminal:syntaxdefinitionmodel.TokenName;
First;


Language Embedding

Embed and EmbedProduction are language embedding operators based on the generation of lexer states to separate the token sets of participating languages. Embed is a single argument operator creating regular in- and out-transitions. In contrast, EmbedProduction embeds multiple languages based on whitespace tokens. This allows users to declare transitions between embedded languages in instances of the declared language.

Embed ==> minimalcl.Composer;
Embed =
  modulename:componentmodel.FragmentName,
  target:syntaxdefinitionmodel.NonTerminalSymbol,
  intransition:regularexpression.AbstractREGExpression?,
  outtransition:regularexpression.AbstractREGExpression?;
EmbedProduction ==> minimalcl.Composer;
EmbedProduction = 
  nt:syntaxdefinitionmodel.NonTerminalSymbol?,
  languages:EmbedParameter+, 
  transition:regularexpression.AbstractREGExpression;
EmbedParameter = 
  transition:regularexpression.AbstractREGExpression,
  modulename:componentmodel.FragmentName,
  target:syntaxdefinitionmodel.NonTerminalSymbol?,
  parameters:TemplateParameter*;

Variation Points

In the following part we introduce variation points into the component model. In Reuseware variation points are called slots. Slottypes can be introduced via inheritance from a predefined upper component metamodel.

GrammarSymbols can be replaced by a slot:

bnfexpression.GrammarSymbol = GrammarSymbolSlot;
GrammarSymbolSlot ==> componentmodel.Slot;
GrammarSymbolSlot;

TerminalSymbols can be replaced by a slot:

bnfexpression.AbstractTerminal = TerminalSlot;
TerminalSlot ==> componentmodel.Slot;
TerminalSlot;

NonterminalSymbols can be replaced by a slot or by a in place call to the Embed composer.

bnfexpression.AbstractNonTerminal = NonTerminalSlot | Embed;
NonTerminalSlot ==> componentmodel.Slot;
NonTerminalSlot;

Regular expressions can be replaced by a slot or by a in place call to Import.

regularexpression.AbstractREGExpression = REGExpressionSlot |
ParametrizedImport;
REGExpressionSlot ==> componentmodel.Slot;
REGExpressionSlot;

EBNF expressions can be replaced by a slot.

bnfexpression.AbstractBNFExpression = BNFExpressionSlot;
BNFExpressionSlot ==> componentmodel.Slot;
BNFExpressionSlot;


Applications in Real Parser Generators

SCC: SableCC Concrete Syntax Mapping

CONCRETESYNTAX scc FOR syntaxdefinitionmodel

SyntaxDefinition ::= "Tokens" lexerproductions "Productions" grammarproductions;
GrammarProductionList ::= productions+;
LexerProductionList	::=  productions+;
GrammarProduction ::= head "=" bnfexp;
bnfexpression.BNFExpression ::= arguments ("|" arguments)*;
bnfexpression.BNFConcat	::= arguments+;
bnfexpression.BNFFactor	::= atom cardinality? ;
bnfexpression.BNFSubExpression	::= "(" nested ")";
TokenReference ::= "T" "." tokenname ;
NonTerminalSymbol ::= 
 ("N" "." )? name[('a'..'z'|'A'..'Z'|'_')  ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*];
cardinalities.OptionalRepetition ::= "*";
cardinalities.Repetition ::= "+";
cardinalities.Optional ::= "?";
regularexpression.QuantifiedRepetition ::= "{" min "," max "}";
LexerProduction ::= ( productiontype )? ( "{" state "}" )?  definitions;
LexerRuleList ::= ( rules )+;
LexerRule ::= ("{" "->" followstate "}")? terminal "=" regex ";" ;
IgnoreType ::= "Ignored Tokens";
RegularType ::= "Normal Tokens";
TokenName ::= name[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*];
regularexpression.REGExpression ::= arguments ("|" arguments)*;
regularexpression.REGConcat	::= arguments+;
regularexpression.REGFactor	::= atom cardinality?;
regularexpression.REGSubExpression ::= "(" nested ")";
regularexpression.NaturalNumber ::= value[('0'..'9')+];
regularexpression.CharacterEnumeration ::= 
  option? "[" ( enumeration  (","  enumeration )*)?  "]";
regularexpression.Invert ::= "!";
regularexpression.RangeEnumeration ::= "[" (enumeration)("," enumeration)* "]";
regularexpression.CharacterRange ::= min ".." max;
LexerState  ::= 
  name[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*];
DefaultLexerState ::= "normal";
regularexpression.AtomicCharacter ::= character[' ... '];
regularexpression.AtomicLiteral ::= characters[' ... '];

JJ: JavaCC Concrete Syntax Mapping

CONCRETESYNTAX jj FOR syntaxdefinitionmodel

SyntaxDefinition ::= "{" lexerproductions grammarproductions "}";
GrammarProductionList ::= productions+;
LexerProductionList	::= productions+;
GrammarProduction ::= "void" head ":" "{" "}" "{" bnfexp "}";
bnfexpression.BNFExpression ::= arguments ("|" arguments)*;
bnfexpression.BNFConcat	::= arguments+;
bnfexpression.BNFFactor	::= atom cardinality? ;
bnfexpression.BNFSubExpression	::= "(" nested ")";
TokenReference ::= "<" tokenname ">";
NonTerminalSymbol ::= 
  name[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*]"(" ")";
cardinalities.OptionalRepetition ::= "*";
cardinalities.Repetition ::= "+";
cardinalities.Optional ::= "?";
regularexpression.QuantifiedRepetition ::= "{" min "," max "}";
LexerProduction ::= ("<" state ">")? productiontype ":" definitions;
LexerRuleList ::= "{" rules ( "|" rules)* "}";
LexerRule ::= "<" terminal ":" regex ">"  (":" followstate)?;
IgnoreType ::= "SKIP";
RegularType ::= "TOKEN";
TokenName ::= name[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*];
regularexpression.REGExpression ::= arguments ("|" arguments)*;
regularexpression.REGConcat	::= arguments+;
regularexpression.REGFactor	::= atom cardinality?;
regularexpression.REGSubExpression ::= "(" nested ")";
regularexpression.NaturalNumber ::= value[('0'..'9')+];
regularexpression.CharacterEnumeration ::= 
  option? "[" ( enumeration  ("," enumeration )*)?  "]";
regularexpression.Invert ::= "~";
regularexpression.RangeEnumeration ::= "[" (enumeration)("," enumeration)* "]";
regularexpression.CharacterRange ::= min "-" max;
LexerState  ::= name[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*];
DefaultLexerState ::= "DEFAULT";
regularexpression.AtomicCharacter ::= character[" ... "];
regularexpression.AtomicLiteral ::= characters[" ... "];

EJJ: JavaCC Based Composition Language

CONCRETESYNTAX ejj FOR reusesyntaxdefinitionmodel EXTENDS jj

ReuseSyntaxDefinition ::=  (declarations)* (super)? nested;
syntaxdefinitionmodel.SyntaxDefinition ::= "{" lexerproductions  grammarproductions "}";
StandardGrammarProduction ::= "void" head ":" "{" "}" "{" bnfexp "}";
MergeGrammarProduction ::= type production;
MergeLexerRule ::= type rule;
Replace ::= "OVERRIDE";
Refine ::= "REFINE";
Remove ::= "REMOVE";
Before ::= "MERGE" "BEFORE" terminal;
First  ::= "MERGE" "FIRST";
ParametrizedImport ::= "IMPORT" modulename ("<" parameters ("," parameters)* ">")?;
ModuleDeclaration ::= "REUSING" type "IN" refValue "AS" name ";";
Parametrize ::= "PARAMETRIZE" modulename "<" parameters ("," parameters)* ">" ";";
Refactor ::= "REFACTOR" modulename "<" parameters ("," parameters)* ">" ";";
TemplateParameter ::= variationPoint "->" reference;
RemoveSymbolParameter ::= "!" target;
RenameTerminalParameter ::= target "->" replacement;
RenameNonTerminalParameter ::= target "->" replacement;
RenameLexerStateParameter ::= target "->" replacement;
Adapt ::= "EXTEND" modulename ("<" parameters ("," parameters)* ">")?;
Embed ::= "@" "(" (intransition)? modulename "::" target (outtransition)? ")" ;
EmbedProduction ::= "@" (nt ( ":" "{" "}" )? )? "{" languages ( "|" languages)* ":" transition "}";
EmbedParameter ::= transition ":" modulename ("<" parameters ("," parameters)* ">")? ("::" target)?;
NonTerminalSlot ::= "[" "NonTerminalSlot" ":" name "]";
TerminalSlot ::= "[" "TerminalSlot" ":" name "]";
GrammarSymbolSlot ::= "[" "GrammarSymbolSlot" ":" name "]";
REGExpressionSlot ::= "[" "ExpressionSlot" ":" name "]";
BNFExpressionSlot ::= "[" "ExpressionSlot" ":" name "]";
componentmodel.VariationPointName ::= name[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*];
componentmodel.FragmentType       ::= 
  language[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*] "."
  construct[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*];
componentmodel.Location     ::=  path['http:/'? ('/' ('A'..'Z'|'a'..'z'|'0'..'9'|'_'|'.'|'-')+)+];
componentmodel.FragmentDefinition  ::=  
  codePieces ('+' codePieces)* "." 
  concreteSyntax[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*] ;
componentmodel.CodePiece   ::=  code['\' ' (~('\' '))* '\' '] ;
componentmodel.FragmentName ::=  
  name[('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*]
  ("["fragmentFilter "]")? ("." subFragmentName)?;
componentmodel.PositionFilter ::= index[('-')? ('0'..'9')+];
componentmodel.PositionFirst  ::= "first";
componentmodel.PositionLast   ::= "last";
componentmodel.NameFilter     ::= fragmentName "=" pattern['\' ' (~('\' '))* '\' '];

Retrieved from "http://www.reuseware.org/index.php/Grammar_Composition"

This page has been accessed 9,975 times. This page was last modified 16:40, 2 February 2009.