...is good for you

Manifesto

  1. Programming languages' syntax should be configurable and extensible. Adding a construct should take only a few lines and be usable right away.
  2. There should be no need to write repetitive code. Meta-programming exists for that purpose.
  3. It should be easy to use the compiler as a front-end to other tools (static analysis, debuggers, documentation, depgraphs.. IDEs?)
  4. C Macros and C++ Templates are evil because they output text instead of manipulating the AST (Abstract Syntax Tree). Manipulating the AST is both safer and more powerful.

Use cases

  • Adding pieces of syntax to the ooc language (e.g. the 'times' statement, regexp literals, operator overloading, etc.)
  • Actually building whole new sublanguages (I think we could implement the python/ruby syntax pretty easily, actually)
  • C header parser in meta, make it add (external) FunctionDecl(s) so we don't need to wrap APIs explicitly.
  • Doxygen/Javadoc-style tool, coded in meta, just has to inspect the AST, process it, and output to a friendlier format (e.g. xml) then maybe process it further with another tool.
  • Python glue code generator, meta allows to inspect the AST, process it and output to other files.
  • Aspect-oriented programming (just inject code in the AST directly)
  • Semi-automatic user code migration from one library version. A kind of "computer readable" changelog, which refactors your code.

Did I say AST too much? ;)

Concepts

  • Token: character sequences into which the parsed source code is divided when first read.
  • Construct: token/construct sequences which either directly map to an AST part which can be inserted in-place, or calls a macro which manipulates the AST.
  • Group: named group of tokens/constructs/whatever that can be used to describe alternatives (e.g. match any in the group), apply characteristics (e.g. ordering)
  • Macro: piece of code that inspect/manipulates the AST and/or communicate with a third-party tool and/or write to files, etc. Can be parametrized (e.g. have arguments), can return a node.

Basic syntax elements

  • {thing1; thing2; thing3} list literal
  • "blah", "blah\n" string literal
  • 234, 3.14 int and float literal
  • new BlahBlah(arg1; arg2); create a new object of type BlahBlah (often AST nodes)
  • blahblah: new BlahBlah(); create a new object and assign it to name blahblah.

Rationale for ';' as a list separator: a macro/function is a list of statements, separated by ';'. Unifying ',' and ';' is more consistent. Rationale for ':' as a name association operator: logical meaning from english, also reminiscent of Pascal's ':=' but shorter. Rationale for the 'new' keyword: unambiguous syntax Rationale for the lack of 'f' suffix for float literals: implementation detail. Rationale for {} for lists but () for constructors/macro calls: unambiguous syntax, allows "late binding" of macro names, for example. (Note: not necessarily a satisfying reason)

Tokens

Sequence definition

A token is defined like that:

TOKEN_NAME: new Token( { [A-Za-z_]+ ; .* ; "str" } );

Quantifiers

Syntax elements in the sequence of the token's definition are like:

  • (nothing): only one
  • !: 0 or 1
  • *: 0 or more
  • +: 1 or more

Character class literals

They are enclosed in []. Ranges can be defined like A-Z. ']' has to be escaped with a backslash., e.g. [\]] Examples:

DUMB_TOKEN: new Token( [abc], [abc] ); // matches aa, ab, ac, ba, bb, bc, ca, cb, cc

String literals

C-like string literals, delimited by ", escaped with backslash:

FUNC_KEYWORD: new Token( "func" );

Special chars

. matches everything, ex:

SIMPLE_XML_TAG: new Token( "<", .*, ">" );

Order

Tokens can be ordered to avoid conflicts (e.g. between keywords and names: both token definitions match, ). Example:

NAME: new Token( [A-Za-z_]; [A-Za-z_0-9]* );
FUNC_KEYWORD: new Token( "func" );

Here, (Houston), we have a problem. Now the string "func" matches both NAME and FUNC_KEYWORD. As the situation is ambiguous, meta will refuse straight out to compile and yield an error (with a detailed message of course). It's easily solved by defining an order:

new Order( FUNC_KEYWORD; NAME );

Now, as it should, FUNC_KEYWORD will have a greater priority than NAME, and the conflict is gone.

Of course if you have more keywords, it would be cumbersome to define an Order for each of them individually so you'd want to use Groups.

Constructs

Like Tokens are usually sequences of characters, Constructs are usually sequences of Tokens and other Constructs, e.g.:

<#
PLUS: new Token( "+" );
DIGIT: new Token( [1-9]; [0-9]* ); // a decimal number doesn't begin with a 0
NumberLiteral: new Construct( DIGIT );
Addition: new Construct( Expr; PLUS; Expr );
Expr: new Group( NumberLiteral; Addition );
#>

Groups

The easiest way to understand groups is by re-considering the 'keywords vs name token priority problem' explained above. Let's say we now have two keywords in our language, "func" and "class", we'd naively code:

NAME: new Token( [A-Za-z_]; [A-Za-z_0-9]* );
FUNC_KEYWORD: new Token( "func" );
CLASS_KEYWORD: new Token( "class" );

new Order( FUNC_KEYWORD; NAME );
new Order( CLASS_KEYWORD; NAME );

But there's a better way:

NAME: new Token( [A-Za-z_]; [A-Za-z_0-9]* );
FUNC_KEYWORD: new Token( "func" );
CLASS_KEYWORD: new Token( "class" );

KEYWORDS: new Group( FUNC_KEYWORD; CLASS_KEYWORD );
new Order( KEYWORDS; NAME );

This way you can easily add keywords to the group. The same trick can be used for constructs, e.g.

// Line and Block defined earlier
Code: new Group( Line; Block );
FOREACH: new Token( "foreach" );
Foreach: new Construct({
  in: { FOREACH; Code }; // now you can foreach with a single statement or a code block =)
});

Rules about syntax

Capitalization:

  • TOKEN_NAME, VERY_LONG_TOKEN_NAME_YOU_SHOULD_THINK_ABOUT_RENAMING
  • TypeName, ClassName, CoverName, HttpIsCamelCaseToo, SoIsUrl, IKnowItsUgly, ButItsConsistent
  • variableName, functionName

Departure from C/C++/Java? Yeah, pretty much. Get used to it. int → Int, float → Float, and so on.

 
[unknown button type]  
Creative Commons License 2010, nddrylliog & the ooc crew