Abstract
This document describes the macro
fiber which is actually a template language used to transform
fiber source code. The macro fiber is an enhancement of the facilities
of the EasyExtend framework and an illustration of some properties of
the fiber-space architecture. The idea of this templating language is
to describe a CST transformation by a target description as macro fiber
source code The macro fiber defines a new type of variable, the node variable and an operator for
evaluating Python code within a macro fiber template, called cit ( compile in template ).
Finally the presence and absence of statements and also parts of
statements can be controlled by optional as well as so called
semi-statements.
Overview
How to define basic template variables and how to call macros.
Introduces the cit (
compile-in-template ) operator that lets us evaluate expressions /
statements within the macro definition.
Optional parts of Python statements are expressed by so called semi-statements in the macro fiber.
1. Node Variables
1.1. Simple templates
In Gallery we have defined the
following
transformation for a user defined repeat-statement.
|
repeat:
SUITE
until:
TEST
|
==>
|
while
1:
SUITE-STMTS
if TEST:
break
|
Instead of defining the transformation in an imperative manner by
creating the target statement stepwise using CST functions defined in
cst.py or cstgen.py we can also declare
the goal of the transformation and let an algorithm perform the
necessary steps. In the example above we have to rip off the stmt nodes
from the SUITE of the source, as well as the test node and insert them
into the target statement. In order to guide the transformation process
we use angle brackets brackets <
> to assign the place where we want to insert the cutted
CSTs. Hence we use <test>
and <suite_stmts> as
dedicated places for node insertion. The names between the angle
brackets are arbitrary choices. We call those entities node variables.
|
repeat:
SUITE
until:
TEST
|
==>
|
while
1:
<suite_stmts>
if <test>:
break
|
A template macro
expansion parses the target first then seeks for node
variables and substitutes them with CST content. The substitution acts
completely on CST level.
from
EasyExtend.fibers.macro.fiber import macro
class FastTransformer(Transformer):
@transform
def repeat_stmt(self, node):
target = """
while
1:
<suite_stmts>
if <test>:
break
"""
_stmts = find_all(node,
symbol.suite, level=1) # read stmts of the SUITE
_test =
find_node(node, symbol.test, level=0) # read TEST
return
macro(target).expand( {'suite_stmts':
_stmts, 'test': _test} )
|
Note that the target string must be indented correctly. However a
certain whitespace offset can
be neglected. The macro fiber automatically prepares the string before
it is parsed.
#
repeat_transform.py --- macro fiber
def repeat_to_while_stmt(suite_stmts, test_node):
expand as while_node:
while
1:
<suite_stmts>
if
<test_node>:
break
return while_node
----------------------------------------------------------------------------------------
# fiber.py --- Gallery
import repeat_transform
class FastTransformer(Transformer):
@transform
def repeat_stmt(self, node):
_stmts = find_all(node,
symbol.suite, level=1) # read stmts of the SUITE
_test =
find_node(node, symbol.test, level=0) # read TEST
return
repeat_transform.repeat_to_while_stmt(_stmts, _tests)
|
The code also presents one kind of fiber usage within an other fiber
definition. While the target
string is a quoted macro fiber statement the macro object creates a
macro fiber transformer during expansion that acts just like any other
transformer on the target string. It can also be extended as such.
1.2. Stared node variables
In our first example we passed a sequence of stmt nodes into the the
macro target. This was easy because the stmt sequence can be embedded
unaltered into the suite of the while-stmt. But suppose we want to pass
a sequence of <test> nodes as parameters into a function. These
nodes might represent expressions like x or a<1 or c-d.... Expanding
<test> naively without regarding comma separation would lead to
the expression x a<1 c-d
which is not even well formed. A node variable is basically interpreted
as a raw node or node sequence without additional structrure applied
to it during transformation. As we have seen above this makes sense for
some node types. In order to pass the <test> sequence correctly stared node variables are
introduced.
|
func(*<test>)
|
==>
|
func(<test>[0],
<test>[1], ..., <test>[n])
|
This preliminary transformation works much like passing a list or tuple
into an ordinary Python function which is defined with a stared
variable argument list. Actually there is no additional grammar rule
used to describe this kind of function call but the node variable is
just passed as a conventional stared parameter. The expansion of
func(*<test>) into the latter form happens before the list items
<test>[i] get expanded.
If func does not define a variable argument list but requires
just a single list parameter we can do some workaround using the listify
function which is defined as:
def listify(*args):
return list(args)
With listify
we can transform func as:
|
func(listify(*<test>))
|
==>
|
func([<test>[0],
<test>[1], ..., <test>[n]])
|
Instead of applying listify(*<test>)
each time we need this argument form we may use the
syntactical form <*test>
and to express the same transformation:
|
func(<*test>)
|
==>
|
func([<test>[0],
<test>[1], ..., <test>[n]])
|
So we state the following identity:
|
|
|
listify(*<test>)
== <*test>
|
1.3. node variable syntax
-
- Node variable. A node variable contains a node name. The
node name is a transformation advice that some particular node in the
parse-tree shall be substituted by this type of node ( usually an
aequivalent node type ). Below we refine this notion.
Node variable syntax:
-
node_var
|
::= |
'<'
['*'] NAME '>'
|
1.4. The macro class
The macro expansion is exposed by the macro class. The macro class has
a single method named expand() which triggers the macro transformer and
returns a new parse tree.
-
class macro: |
|
__init__
|
(target,
[transformer]
)
|
- A macro object is
initialized using a transformation target. The target has to be a
string but it can also be a callable whose __doc__ is used as a string
target. Internally leading whitespaces will be trimmed s.t. you only
have to care for correct indentations as usual. The second optional
argument is a Transformer object. If a transformer T calls a macro
transformer, T shall pass self
to the macro constructor enabling mutually recursive calls.
-
expand
|
(node_vars,
locals = {})
|
- The expand method
performs the actual transformation and returns a CST of the transformed
target. Node variables are passed as a dictionary into expand. A node
variable <X> corresponds to a key "X" in the node_vars dict. You
can pass additional local variables in another dict.
2. compile in template
So far we have seen how to perform simple substitutions of node
variables within macro fiber code. We want to extend this evaluation
scheme now by inserting an operator that performs arbitrary code
evaluations at expansion time.
For motivation purposes lets start with following macro
fiber code
|
if
x<0:
<items>[0]
else:
<items>[1]
|
Lets further suppose that the <items> node
variable is a placeholder for a list named lst. Then
the
transformed code will look like this:
|
if
x<0:
lst[0]
else:
lst[1]
|
But what if the value of the <items> node
variable is not a NAME token, but itself a list of the two flow control
statements continue
and break?
Now we expect the result of the transformation to be
|
if
x<0:
continue
else:
break
|
So we would like to apply __getitem__ on the associated values passed
into <items>
and not on the result of the transformation.
This means we have to evaluate <items>[0]
and <items>[1]
at expansion time and use the received value (
i.e. the CST <items>[k]
for k = 1,2 )
for substitution.
2.1
cit itself
We
have just seen that we
need to distinguish between operations before and after node variable
substitutations. The cit (compile in template)
operator supports evaluations at
expansion time. We need to insert cit into macro definitions
explicitely. For example with <items> =
( continue_stmt, break_stmt ) we get the following
transformation:
|
if
x<0:
(cit <items>[0])
else:
(cit <items>[1])
|
==>
|
if
x<0:
continue
else:
break
|
-
- cit evaluates a
macro fiber expression as a CST and returns a CST to the macro
transformer. The cit operator is defined by an own grammar rule. The
macro transformer substitutes the cit node in the surrounding CST by
the value returned by cit.
cit operator
syntax:
-
cit
|
::= |
'cit' test
|
argument
|
::=
|
cit
| [test '='] test [gen_for]
|
atom |
::=
|
node_var
| '(' [testlist_gexp | cit] ')' | ...
|
From the embedding of cit into atoms and arguments we see immediately
that cit always has to be parenthesized. This syntax might look unusual
and unpythonic but keep in mind that cit is not a
function that returns a value but an operator that returns a CST within
an expression. To understand the rationale for cit's syntax consider
the following expression:
First the rightmost cit evaluates to a CST representing '1'. This CST("1")
replaces 'cit
1' and so we have get the intermediate result:
Next the left cit tries to evaluate 0 + CST(1) and fails with a message
about a CST wrapper ( a list ) that cannot be added to
an int. Otherwise we don't want to bind 0 stronger to cit than we'd
bind it to an addition.
The safest way to prevent this behaviour is to put parens around cit
and throw a syntax error when they are missing. This one is most easily
detected.
(cit
0) + (cit 1) # evaluates to 0 + 1 -> o.k.
|
2.2 Local variables
For non-trivial bodies the evaluation of cit depends on local
variables. The only type of variables initially known by cit are node
variables which get passed in their own dict. Additional variables can
be passed using the locals dictionary instead:
target = "f(cit 2+j) + g(cit j*9)"
cst = macro(target).expand({}, locals={"j":9}) # no node
vars needed but we pass j with j = 9
unparse(cst) -> "f(11) + g(81)"
2.3 cit on the console
You might try out cit on the console. There is nothing spectacular:
mc>
(cit 42)
42
mc> (cit "Spam ") +"with" +(cit "eggs")
'Spam with eggs'
|
Nested calls are allowed:
mc>
(cit 2 + (cit 40))
42
|
A little more complicated is the behaviour of cit when it tries to
evaluate lists or tuples. The basic assumption is that each list/tuple
is already a CST representation of some expression and cit returns it
unchecked. But when
returned unchecked the transformer tries to subsitute cit by the
supposed CST and is likely going to fail:
mc>
(cit [0])
Traceback (most recent call last):
File
"C:\lang\Python24\lib\site-packages\EasyExtend\eeconsole.py", line 223,
in compile_cst
transformer.run(parseTree)
....
self.run(sub, chain = chain+[tree], locals=locals,
prio = prio)
File
"C:\lang\Python24\lib\site-packages\EasyExtend\eetransformer.py", line
257, in run
raise TranslationError( S )
TranslationError: unable to subst node (4942, 'cit') by node (0,
'ENDMARKER') in:
file_input -- S`4865
-- 257
stmt --
S`4874 -- 266
simple_stmt -- S`4875 -- 267
....
power -- S`4918 -- 310
atom -- S`4919 -- 311
LPAR -- T`4615 -- 7 L`1
(
1
cit -- S`4942 -- 334
<-------
....
RPAR -- T`4616 -- 8 L`1
)
|
When the node transformer finds 0 as the first element of a list it
assumes having found the ENDMARKER token because token.ENDMARKER = 0.
Passing lists that are no CSTs into cit is harmfull. But you might wrap
a list into an envelope and
pass it into cit.
-
- env
wraps its single argument into a CST node of node type test. This way
env can be
combined with cit. Only arguments of builtin types int, long, float,
basestring, list, tuple, dict and None can be passed to env. Otherwise
a TypeError is raised.
mc>
(cit env([1,2,3]))
[1,2,3]
mc> len(cit env([1,2,3]))
3
|
2.4 cit and node variables
The main purpose of cit is to evaluate functions defined on node
variables. A technical detail on evaluation of node variables is that
their values won't ever be evaluated by cit. To evaluate its argument
cit
always has to execute eval() on some piece of Python code. But what
does a
function call eval(<stmt>[0])
mean given that <stmt>
is already retranslated to Python source? The answer is: nothing.
It simply won't work
because eval cannot evaluate statements. But we don't want to keep the
statement at all but just the corresponding CST! So we have apply a
trick and wrap the
node variable into a tunnel
object. The evaluation will look like eval("self.tunnel.get_node(2924720)[0]",
locals()) and returns exactly the first <stmt>
node among a list of these nodes. That's because the method call on the
tunnel will return <stmt>
where it was pushed into the tunnel before.
The tunnel wrapping is ( and should not be ) immediately visible
to the programmer. how cit
operates on node variables. To do this on the console ( for
experimentation ) the macro fiber provides the function meval.
-
- meval evaluates a
target using node variables encoded in keyword arguments. As values of
these keyword arguments either strings or lists ( tuples ) of strings
are accepted.
mc>
meval("<test>", test = "0")
'( 0 )'
mc> meval("(cit
<test>)", test = "0")
'0'
|
You might have observed tiny artifacts of transformations just like the
parens around 0 in the first meval. They do not affect the semantics of
the expressions being returned. The true reason for the appearance of
(0) is that "(0)" is represented by a node of type atom and this
can be advantageous for certain node replacements. There are still some
other semantically indifferent transformation artifacts that might
vanish in future due to "smoothing" the source code.
mc>
meval("(cit <test>)", test = ("0","1"))
'0 1'
mc> meval("<*test>",
test = ("0","1"))
'listify ( 0 , 1 , )'
mc> meval("cit
<*test>", test = ("0","1"))
# Does not work! <*test> will be expanded into
Traceback (most recent call
last):
# listify(0,1,) first which is aequivelent to [0,1].
File "<input>", line 2, in
?
# Returning [0,1] leads to substitution havoc.
...
TranslationError: unable to subst node (4942, 'cit') by node (0,
'ENDMARKER')
mc> meval("cit
env(<*test>)", test = ("0","1")) # wrapping argument into
env is o.k.
'[ 0 , 1 ]'
|
Adding two node variables leads at best to an arbitrary list when
expanded. You must be very lucky if this leads to any usefull result.
Adding the values of these variables by means of previous application
of cit sounds a lot more reasonable.
mc>
meval("<test>", test = "0")
'( 0 )'
mc> meval("(cit
<test>)", test = "0")
'0'
|
3. Semi-statements and optional statements
3.1 Semi statements
Semi-statements are the fragments of compound statements. Typically a
compound statement consists of of a main branch and one or more
subsequent
branches. Take for instance the if_stmt with its elif- and else
subbranches. A semi statement
is
nothing but one of these subbranches that is not itself
a statement.
Semi stmt syntax:
-
semi_stmt
|
::= |
elif_branch
| else_branch | except_branch | finally_branch
|
with
-
elif_branch
else_branch
except_branch
finally_branch
|
::=
::=
::=
::=
|
'elif'
test ':' suite
'else' ':' suite
except_clause ':' suite
'finally' ':' suite
|
3.2 The optional_stmt statement
Depending on the source of transformation one or more semi-statements
have to be generated to complete a compound statement. The coordination
of these insertions is managed by a new statement called optional_stmt.
Optional branch syntax:
-
optional_stmt
|
::= |
'optional'
test | (exprlist 'in' testlist) ':' suite | semi_stmt
|
Optional statements occur in two shapes depending the
first node after the optional
keyword. The first optional statement is about a conditional
insertion of a suite or
semi_stmt ( if-option ) the
other one mimicks a for loop ( for-option
).
3.2.1 optional_stmt semantics
First consider the if-option.
|
if
TEST_1:
BLOCK_1
optional TEST_2:
else:
BLOCK_2
|
Depending on the value of TEST_2 the else_branch semi-statement
will be added to the if branch. The final compound statement is one of
those two versions.
|
#
TEST_2 -> True
if TEST_1:
BLOCK_1
else:
BLOCK_2
|
#
TEST_2 -> False
if TEST_1:
BLOCK_1
|
The for-option is evaluated differently. Consider for instance
following optional statement.
|
optional
item in <*expr_stmt>:
print (cit unparse(item).strip())
(cit item)
|
Several transformation steps have to be performed:
- Create a list comprehension [item for item in
<*expr_stmt>] and evaluate it using this macro
transformer.
- Create {item: node} pairs from the evaluated list_comp that
can be passed as local variables to the body of the optional_stmt.
- Create as many versions of the optional_stmt body as
{item:node} pairs are available.
- Macro transform each body version with one {item:node}
pair. The resulting stmt nodes are listed.
- Create a suite node from all resulting stmt nodes and
replace the optional_stmt in the CST by this suite.
If we use the optional_stmt above to macro transform the
following source
the result will be
|
print
("a = 2")
a = 2
print ("b = a*5")
b = a*5
print ("c = a+b")
b = a+b
|
3.3 More examples
The Gallery fiber defines a
switch statement regarding chainlets.
switch
SWITCH_TEST:
case CASE_TEST_0:
SUITE_1
case CASE_TEST_1:
SUITE_2
...
else:
SUITE_N
|
==>
|
if
isChainlet(<switch_test>):
<SELECT> =
<switch_test>.select(*<case_tests>)
else:
<SELECT> = <switch_test>
if <SELECT> == (cit <case_tests>[0]):
cit <case_suites>[0]
optional i in range(1, (cit len(<case_tests>)) ):
elif <SELECT> == (cit <case_tests>[i]):
cit <case_suites>[i]
optional <else_suite>:
else:
<else_suite>
|
Another simpler example involving just a few node variables and one
if-option ca be found in the with_stmt transformation of the Py25Lite
fiber.
with EXPR as VAR:
BLOCK
|
==>
|
mgr
= (<EXPR>)
exit = mgr.__exit__
value = mgr.__enter__()
exc = True
try:
try:
optional <VAR>:
<VAR> =
value
<BLOCK>
except:
exc = False
if not exit(*sys.exc_info()):
raise
finally:
if exc:
exit(None, None, None)
|
|