Langscape is an open source Python library of grammar based source code processing tools. This may sound like a somewhat complicated circumscription of “yet another compiler compiler” or simply “parser generator” but parsers are only one but arguably the most important of a couple of source tools. Here is an incomplete list of grammar based source tools:
A central feature of Langscape is to provide all of those tools once language syntax is specified by means of an EBNF grammar.
Langscape is an open source Python framework used to build and extend language components. A language component is called langlet in Langscape. A langlet is syntax definition together with a set of classes that control how source code is parsed, transformed, imported and so on. Both syntax and functionality can be inherited from parent langlets. Classes are extended in ordinary OOP style using subclasses. Syntax is overwritten by redefining and adding grammar rules.
Langscape contains some “pythonisms” like providing an interactive console for each langlet. Even though code may not be executable in that style for some languages, console interaction helps in the syntax definition and language transformation phases. Langscape has generally good debugging support. All sorts of objects created during parsing and transformation can be displayed. The display is configured using command line parameters. It doesn’t mean that debugging syntax and parsers isn’t a total headache anymore but at least one doesn’t tap in the dark.
Langscape is about this:
{(4, 2, 0, 1044): [(5, 3, 0, 1044)],
(5, 3, 0, 1044): [(1011, 4, 0, 1044)],
(6, 5, 0, 1044): [(-1, -1, 0, 1044)],
(1011, 4, 0, 1044): [(6, 5, 0, 1044), (1011, 4, 0, 1044)],
(1012, 1, 0, 1044): [(-1, -1, 0, 1044)],
(1044, 0, 0, 1044): [(4, 2, 0, 1044), (1012, 1, 0, 1044)]}
No, it is not a joke. Of course this finite state automaton representation looks like a fierce implementation detail but its “machine code” is probably the most important single aspect to teach about Langscape. Using a translation dictionary
{4: 'NEWLINE', 5: 'INDENT', 6: 'DEDENT', 1011: 'stmt', 1012: 'simple_stmt', 1044: 'suite'}
and a tool called nfa2grammar I’ve written one can translate the automaton back into the following grammar rule where it origins
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
This rule contains a symbolic representation of Pythons significant whitespace in EBNF. It is easier to read, write and understand than the automaton but it is much easier to reason about the automaton both intellectually as well as programmatically. Pure geek joy.
The different facets of Langscape can be summarized in the Langscape Triangle
Which side of the triangle attracts your attention is up to you. Of course Langscape is the unit. Take one side away and Langscape vanishes.