Easy Extend


                                                                                     
                                                


              Author: Kay Schluehr
              Latest Change: 05.11.2006
              Version: 0.9

 

     

Abstract

This document describes the macro fiber which is actually a template language used to transform fiber source code. The macro fiber is an enhancement of the facilities of the EasyExtend framework and an illustration of some properties of the fiber-space architecture. The idea of this templating language is to describe a CST transformation by a target description as macro fiber source code The macro fiber  defines a new type of variable, the node variable and an operator for evaluating Python code within a macro fiber template, called cit ( compile in template ). Finally the presence and absence of statements and also parts of statements can be controlled by optional as well as so called semi-statements.

Overview

1. Node variables

How to define basic template variables and how to call macros.

2. compile-in-template

Introduces the cit ( compile-in-template ) operator that lets us evaluate expressions / statements within the macro definition.

3. Semi-statements and optional statements

Optional parts of Python statements are expressed by so called semi-statements in the macro fiber.


1. Node Variables


1.1. Simple templates

In Gallery we have defined the following transformation for a user defined repeat-statement.


repeat:
    SUITE
until:
    TEST
 

    ==>
while 1:
    SUITE-STMTS
    if TEST:
        break


Instead of defining the transformation in an imperative manner by creating the target statement stepwise using CST functions defined in cst.py or cstgen.py we can also declare the goal of the transformation and let an algorithm perform the necessary steps. In the example above we have to rip off the stmt nodes from the SUITE of the source, as well as the test node and insert them into the target statement. In order to guide the transformation process we use angle brackets brackets < > to assign the place where we want to insert the cutted CSTs. Hence we use <test> and <suite_stmts> as dedicated places for node insertion. The names between the angle brackets are arbitrary choices. We call those entities node variables.


repeat:
    SUITE
until:
    TEST
 

    ==>
while 1:
    <suite_stmts>
    if <test>:
        break


A template macro expansion parses the target first then seeks for node variables and substitutes them with CST content. The substitution acts completely on CST level.

from EasyExtend.fibers.macro.fiber import macro

class FastTransformer(Transformer):
    @transform
    def repeat_stmt(self, node):
        target = """
        while 1:
            <suite_stmts>
            if <test>:
                break
        """
        _stmts = find_all(node, symbol.suite, level=1)    # read stmts of the SUITE
        _test  = find_node(node, symbol.test, level=0)    # read TEST

        return macro(target).expand( {'suite_stmts': _stmts, 'test': _test} )


Note that the target string must be indented correctly. However a certain whitespace offset can be neglected. The macro fiber automatically prepares the string before it is parsed.


# repeat_transform.py  ---  macro fiber

def repeat_to_while_stmt(suite_stmts, test_node):
    expand as while_node:
        while 1:
            <suite_stmts>
            if <test_node>:
                break
    return while_node
  
----------------------------------------------------------------------------------------
# fiber.py  ---  Gallery

import repeat_transform

class FastTransformer(Transformer):
    @transform
    def repeat_stmt(self, node):
        _stmts = find_all(node, symbol.suite, level=1)    # read stmts of the SUITE
        _test  = find_node(node, symbol.test, level=0)    # read TEST

        return repeat_transform.repeat_to_while_stmt(_stmts, _tests)



The code also presents one kind of fiber usage within an other fiber definition. While the target string is a quoted macro fiber statement the macro object creates a macro fiber transformer during expansion that acts just like any other transformer on the target string. It can also be extended as such.

1.2. Stared node variables

In our first example we passed a sequence of stmt nodes into the the macro target. This was easy because the stmt sequence can be embedded unaltered into the suite of the while-stmt. But suppose we want to pass a sequence of <test> nodes as parameters into a function. These nodes might represent expressions like  x or a<1 or c-d.... Expanding <test> naively without regarding comma separation would lead to the expression x a<1 c-d which is not even well formed. A node variable is basically interpreted as a raw node or node sequence without additional structrure applied to it during transformation. As we have seen above this makes sense for some node types. In order to pass the <test> sequence correctly stared node variables are introduced.


func(*<test>)
==>
func(<test>[0], <test>[1], ..., <test>[n])

This preliminary transformation works much like passing a list or tuple into an ordinary Python function which is defined with a stared variable argument list. Actually there is no additional grammar rule used to describe this kind of function call but the node variable is just passed as a conventional stared parameter. The expansion of func(*<test>) into the latter form happens before the list items <test>[i] get expanded.

If  func does not define a variable argument list but requires just a single list parameter we can do some workaround using the listify function which is defined as:

    def listify(*args):
        return list(args)


With listify we can transform func as:


func(listify(*<test>))
==>
func([<test>[0], <test>[1], ..., <test>[n]])

Instead of applying listify(*<test>) each time we need this argument form we may use the syntactical form <*test> and to express the same transformation:


func(<*test>)
==>
func([<test>[0], <test>[1], ..., <test>[n]])

So we state the following identity:




listify(*<test>) == <*test>

1.3. node variable syntax

Syntax:
< ['*'] node>
Node variable. A node variable contains a node name. The node name is a transformation advice that some particular node in the parse-tree shall be substituted by this type of node ( usually an aequivalent node type ). Below we refine this notion.

Node variable syntax: 

node_var
::= '<' ['*'] NAME '>'

1.4. The macro class

The macro expansion is exposed by the macro class. The macro class has a single method named expand() which triggers the macro transformer and returns a new parse tree.
class macro:
   __init__
(target, [transformer] )               
A macro object is initialized using a transformation target. The target has to be a string but it can also be a callable whose __doc__ is used as a string target. Internally leading whitespaces will be trimmed s.t. you only have to care for correct indentations as usual. The second optional argument is a Transformer object. If a transformer T calls a macro transformer, T shall pass self to the macro constructor enabling mutually recursive calls.
   expand
 (node_vars, locals = {}) 
The expand method performs the actual transformation and returns a CST of the transformed target. Node variables are passed as a dictionary into expand. A node variable <X> corresponds to a key "X" in the node_vars dict. You can pass additional local variables in another dict.

2. compile in template


So far we have seen how to perform simple substitutions of node variables within macro fiber code. We want to extend this evaluation scheme now by inserting an operator that performs arbitrary code evaluations at expansion time.

For motivation purposes lets start with following macro fiber code


if x<0:
   <items>[0]
else:
   <items>[1]


 Lets further suppose that the <items> node variable is a placeholder for a list named lst. Then the transformed code will look like this:


if x<0:
   lst[0]
else:
   lst[1]


But what if the value of the <items> node variable is not a NAME token, but itself a list of the two flow control statements continue and break? Now we expect the result of the transformation to be


if x<0:
   continue
else:
   break


So we would like to apply __getitem__ on the associated values passed into <items> and not on the result of the transformation.
This means we have to evaluate <items>[0] and <items>[1] at expansion time and use the received value ( i.e. the CST <items>[k] for k = 1,2 ) for substitution.

2.1 cit itself

We have just seen that we need to distinguish between operations before and after node variable substitutations. The cit (compile in template) operator supports evaluations at expansion time. We need to insert cit into macro definitions explicitely. For example with <items> = ( continue_stmt, break_stmt ) we get the following transformation:

         
if x<0:               
   (cit <items>[0])
else:                          
   (cit <items>[1])


==>
if x<0:
   continue
else:
   break


cit
 CST -> CST
cit evaluates a macro fiber expression as a CST and returns a CST to the macro transformer. The cit operator is defined by an own grammar rule. The macro transformer substitutes the cit node in the surrounding CST by the value returned by cit.

cit operator syntax: 

cit
::= 'cit' test
argument
::=
cit | [test '='] test [gen_for]
atom ::=
node_var | '(' [testlist_gexp | cit] ')' | ...

From the embedding of cit into atoms and arguments we see immediately that cit always has to be parenthesized. This syntax might look unusual and unpythonic but keep in mind that cit is not a function that returns a value but an operator that returns a CST within an expression. To understand the rationale for cit's syntax consider the following expression:

cit 0 + cit 1

First the rightmost cit evaluates to a CST representing '1'. This CST("1") replaces 'cit 1' and so we have get the intermediate result:

cit 0 + CST(1)

Next the left cit tries to evaluate 0 + CST(1) and fails with a message about a CST wrapper ( a list ) that cannot be added to an int. Otherwise we don't want to bind 0 stronger to cit than we'd bind it to an addition.

The safest way to prevent this behaviour is to put parens around cit and throw a syntax error when they are missing. This one is most easily detected.

(cit 0) + (cit 1)  # evaluates to 0 + 1  -> o.k.

2.2  Local variables

For non-trivial bodies the evaluation of cit depends on local variables. The only type of variables initially known by cit are node variables which get passed in their own dict. Additional variables can be passed using the locals dictionary instead:

   target = "f(cit 2+j) + g(cit j*9)"
   cst = macro(target).expand({}, locals={"j":9})   # no node vars needed but we pass j with j = 9

   unparse(cst) -> "f(11) + g(81)"

2.3 cit on the console

You might try out cit on the console. There is nothing spectacular:

mc> (cit 42)
42
mc> (cit "Spam ") +"with" +(cit "eggs")
'Spam with eggs'

Nested calls are allowed:

mc> (cit 2 + (cit 40))
42

A little more complicated is the behaviour of cit when it tries to evaluate lists or tuples. The basic assumption is that each list/tuple is already a CST representation of some expression and cit returns it unchecked. But when returned unchecked the transformer tries to subsitute cit by the supposed CST and is likely going to fail:

mc> (cit [0])
Traceback (most recent call last):
  File "C:\lang\Python24\lib\site-packages\EasyExtend\eeconsole.py", line 223, in compile_cst
    transformer.run(parseTree)
   ....   
    self.run(sub, chain = chain+[tree], locals=locals, prio = prio)
  File "C:\lang\Python24\lib\site-packages\EasyExtend\eetransformer.py", line 257, in run
    raise TranslationError( S )
TranslationError: unable to subst node (4942, 'cit') by node (0, 'ENDMARKER') in:

        file_input  -- S`4865 -- 257
          stmt  -- S`4874 -- 266
            simple_stmt  -- S`4875 -- 267
                ....
                  power  -- S`4918 -- 310
                    atom  -- S`4919 -- 311
                      LPAR  -- T`4615 -- 7     L`1
                        (
                        1
                      cit  -- S`4942 -- 334       <-------
                        ....
                      RPAR  -- T`4616 -- 8     L`1
                        )

When the node transformer finds 0 as the first element of a list it assumes having found the ENDMARKER token because token.ENDMARKER = 0. Passing lists that are no CSTs into cit is harmfull. But you might wrap a list into an envelope and pass it into cit.
env
( arg )
env wraps its single argument into a CST node of node type test. This way env can be combined with cit. Only arguments of builtin types int, long, float, basestring, list, tuple, dict and None can be passed to env. Otherwise a TypeError is raised.

mc> (cit env([1,2,3]))
[1,2,3]
mc> len(cit env([1,2,3]))  
3

2.4 cit and node variables

The main purpose of cit is to evaluate functions defined on node variables. A technical detail on evaluation of node variables is that their values won't ever be evaluated by cit. To evaluate its argument cit always has to execute eval() on some piece of Python code. But what does a function call eval(<stmt>[0]) mean given that <stmt> is already retranslated to Python source? The answer is: nothing. It  simply won't work because eval cannot evaluate statements. But we don't want to keep the statement at all but just the corresponding CST! So we have apply a trick and wrap the node variable into a tunnel object. The evaluation will look like eval("self.tunnel.get_node(2924720)[0]", locals()) and returns exactly the first <stmt> node among a list of these nodes. That's because the method call on the tunnel will return <stmt> where it was pushed into the tunnel before.

The tunnel wrapping  is ( and should not be ) immediately visible to the programmer. how cit operates on node variables. To do this on the console ( for experimentation ) the macro fiber provides the function meval.
meval
( target, **kwd )
meval evaluates a target using node variables encoded in keyword arguments. As values of these keyword arguments either strings or lists ( tuples ) of strings are accepted.
mc> meval("<test>", test = "0")
'( 0 )'
mc> meval("(cit <test>)", test = "0")
'0'                                          

You might have observed tiny artifacts of transformations just like the parens around 0 in the first meval. They do not affect the semantics of the expressions being returned. The true reason for the appearance of (0)  is that "(0)" is represented by a node of type atom and this can be advantageous for certain node replacements. There are still some other semantically indifferent transformation artifacts that might vanish in future due to "smoothing" the source code.

mc> meval("(cit <test>)", test = ("0","1"))
'0 1'

mc> meval("<*test>", test = ("0","1")) 
'listify ( 0 , 1 , )'

mc> meval("cit <*test>", test = ("0","1"))     # Does not work! <*test> will be expanded into
Traceback (most recent call last):             # listify(0,1,) first which is aequivelent to [0,1].
  File "<input>", line 2, in ?                 # Returning [0,1] leads to substitution havoc.
...
TranslationError: unable to subst node (4942, 'cit') by node (0, 'ENDMARKER')

mc> meval("cit env(<*test>)", test = ("0","1"))  # wrapping argument into env is o.k.
'[ 0 , 1 ]'

Adding two node variables leads at best to an arbitrary list when expanded. You must be very lucky if this leads to any usefull result. Adding the values of these variables by means of previous application of cit sounds a lot more reasonable.

mc> meval("<test>", test = "0")
'( 0 )'
mc> meval("(cit <test>)", test = "0")
'0'                                          



3. Semi-statements and optional statements

3.1 Semi statements


Semi-statements are the fragments of compound statements. Typically a compound statement consists of of a main branch and one or more subsequent branches. Take for instance the if_stmt with its elif- and else subbranches. A semi statement is nothing but one of these subbranches that is not itself a statement.

Semi stmt syntax:

semi_stmt
::= elif_branch | else_branch | except_branch | finally_branch
with
elif_branch

else_branch

except_branch

finally_branch
::=

::=

::=

::=
'elif' test ':' suite

'else' ':' suite

 except_clause ':' suite

'finally' ':' suite

3.2 The optional_stmt statement


Depending on the source of transformation one or more semi-statements have to be generated to complete a compound statement. The coordination of these insertions is managed by a new statement called optional_stmt.

Optional branch syntax: 

optional_stmt
::= 'optional' test | (exprlist 'in' testlist) ':' suite | semi_stmt

Optional statements occur in two shapes depending the first node after the optional keyword. The first optional statement  is about a conditional insertion of a suite or semi_stmt ( if-option ) the other one mimicks a for loop ( for-option ).

3.2.1 optional_stmt semantics

First consider the if-option.


if TEST_1:
    BLOCK_1
optional TEST_2:
    else:
        BLOCK_2

Depending on the value of  TEST_2 the else_branch semi-statement will be added to the if branch. The final compound statement is one of those two versions.

             
# TEST_2 -> True
                             
if TEST_1:
    BLOCK_1
else:
    BLOCK_2
# TEST_2 -> False
                    
if TEST_1:
    BLOCK_1



The for-option is evaluated differently. Consider for instance following optional statement.


optional item in <*expr_stmt>:
    print (cit unparse(item).strip())
    (cit item)

Several transformation steps have to be performed:
  1. Create a list comprehension [item for item in <*expr_stmt>] and evaluate it using this macro transformer.
  2. Create {item: node} pairs from the evaluated list_comp that can be passed as local variables to the body of the optional_stmt.
  3. Create as many versions of the optional_stmt body as {item:node} pairs are available.
  4. Macro transform each body version with one {item:node} pair. The resulting stmt nodes are listed.
  5. Create a suite node from all resulting stmt nodes and replace the optional_stmt in the CST by this suite.
If we use the optional_stmt above to macro transform the following source


a = 2 
b = a*5
c = a+b 

the result will be


print ("a = 2")
a = 2
print ("b = a*5")
b = a*5
print ("c = a+b")
b = a+b

3.3 More examples

The Gallery fiber defines a switch statement regarding chainlets.

switch SWITCH_TEST:
    case CASE_TEST_0:
          SUITE_1
    case CASE_TEST_1:
          SUITE_2
    ...
else:
    SUITE_N

                         

  



 ==> 

if isChainlet(<switch_test>):
    <SELECT> = <switch_test>.select(*<case_tests>)
else:
    <SELECT> = <switch_test>
if <SELECT> == (cit <case_tests>[0]):
    cit <case_suites>[0]
optional i in range(1, (cit len(<case_tests>)) ):
    elif <SELECT> == (cit <case_tests>[i]):
        cit <case_suites>[i]
optional <else_suite>:
    else:
        <else_suite>

 
Another simpler example involving just a few node variables and one if-option ca be found in the with_stmt transformation of the Py25Lite fiber.









with EXPR as VAR:
    BLOCK
                         

  






 ==> 

mgr = (<EXPR>)
exit = mgr.__exit__ 
value = mgr.__enter__()
exc = True
try:
    try:
        optional <VAR>:
            <VAR> = value 
        <BLOCK>
    except:  
        exc = False
        if not exit(*sys.exc_info()):
            raise      
finally:
    if exc:
        exit(None, None, None)