Introduction
This is an introductory level tutorial for EasyExtend. It guides you
through the creation of an extension language that is simple enough to
be understood in any detail but also complex enough to shed light on
many aspects of EasyExtend. After reading this tutorial you shall be
able to create
your own extension languages, called fibers
in the EasyExtend context.
Prerequisites:
understanding the Python language on tutorial level and you have installed the
EasyExtend package on your machine and opened a Python console and an
editor.
Create a fiber
A fiber encodes an
extension language or a code generator within the EasyExtend framework.
It might be helpfull to think of a fiber as a Python dialect that
adds some new syntax and semantics to the language. EasyExtend
preprocesses source files of the new extension languages using the
custom fiber definitions and generates proper Python code that will be
compiled to bytecodes.
Physically a fiber is represented by a small set of files which are
mandatory for each fiber definition. EasyExtend implements a package
level command named new_fiber which does the job of creating these
files for you. Our first fiber is called hex_fiber, which implements a
modified semantics of a hexadecimal literal.
>>>
import EasyExtend
>>> EasyExtend.new_fiber("hex_fiber", prompt = "hx> ")
New fiber 'hex_fiber' created:
[EasyExtend]+-[fibers]
+-
[hex_fiber]
+- conf.py
+- fiber.py
+- PyGrammar.py
+- Grammar
+- [fiber_mod]
+- symbol.py
+- token.py
>>> EasyExtend.run("hex_fiber") # checkout
our new fiber
___________________________________________________________________________________
hex_fiber
Running on Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
32 bit (Intel)]
___________________________________________________________________________________
hx> 0x0F # normal Python
commands shall be available
15
hx> quit # use quit to
cancel a fiber - even in Python 2.4 !
___________________________________________________________________________________
|
The targetted hex_fiber will contain just one simple extension : a
modified
hexadecimal number literal. Python contains already a way to type
hexadecimal
numbers indicated by the 0x prefix but those are represented as decimal integers. We want to
prevent an immediate rebasing of the number and work with hexadecimals
directly. Hence the hex_fiber changes the semantics of
hexadecimal literals such
that a number 0x7F is represented by a still to be defined Hex object.
A use pattern of our modified Hex literal is shown in the next box:
hx>
0x00
# 0x00 is a Hex object represented by the hex-literal notation
0x00
hx> h = 1
2 #
sequences of numbers are allowed and will be converted into Hex objects
hx> h
0x01 0x02
hx> h ++ 0x03 0x04 # new binary operator '++' defines
concatenation among Hex
objects
0x01 0x02 0x03 0x04
|
Once being defined we can add methods to our Hex object on our own
behalf. The hex_fiber is
usefull when bit fiddling is involved in our application a lot and we
try to enable a native type
feeling of all our
hexadecimal operations. The hex_fiber is our first domain specific
language - Voilá.
Define grammar rules
EasyExtend acts as a transformer of parse-trees or CSTs ( Concrete Syntax Trees ).
CSTs reflect the grammatical structure of a language directly. Once you
have added a new grammar rule to the fibers Grammar
file ( Grammar is a file
in the file tree of your fiber ) the structure
of the CST is completely determined by this rule. The terminals of the grammar
contain the material content of the source such as names, numbers or
punctuation like parens and dots. They don't get expanded anymore and
refer to nothing. That's why they are terminal.
Non-terminals are rule templates that refer to other non-terminals or
terminals. Pythons grammar is of class LL(1) and Python requires a mild
preprocessing during tokenization. Don't be scared of semantic
whitespace! In the Grammar it has a residual status in the form of the
two terminals INDENT and DEDENT which are moral equivalents of braces.
Most
fiber definitions start with the addition of a new grammar rule and the
modification of an existing one. The modification is necessary because
we have to link our new grammar rule with the existing rule tree.
Otherwise new rules will stay disconnected and might not be applied by
the
parser. New grammar rules are always appended at the end of the Grammar
file. This is enforced by compliency with Pythons bytecode compiler
which expects a strict order of rules. So if you want to drop a rule
don't delete it from the Grammar but unlink it so that it can't be
found when parsing the source.
Requirements
In case of our hex_fiber exactly two requirements have to be translated
into grammar rules:
- A sequence of hexadecimal numbers 0xXX.. 0xYY... shall be
mapped onto one Hex object.
- A concatenation operator ++ is defined among Hex objects.
Together with the new operator ++ we define a new __concat__ protocol (
and
also an __rconcat__ protocol, o.k.? ) that can be bound to other
objects.
The numbers
non-terminal
For handling the first requirement we introduce a numbers non-terminal. Pythons
default
Grammar handles just one NUMBER terminal as an atom as shown below:
#
Grammar
...
...
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
atom: '('
[testlist_gexp] ')' | '[' [listmaker] ']' | '{' [dictmaker] '}' | '`'
testlist1 '`' |
NAME | NUMBER | STRING+
...
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
...
#
# adding new grammar rules here
#
|
When trying to type a sequence of numbers on our hex_fiber console the
runtime thows a SyntaxError because the parser expects a subsequent
non-numeral
node. ( See how this is different from STRING terminals! ) When trying
to type 1 [0] instead a
grammatically correct expression is created on the first sight. However
this expression is rejected on a deeper
level within the compiler machinery and causes a TypeError as well:
hx> 1 3
File "<stdin>", line 1
1 3
^
SyntaxError: invalid syntax
hx> 1 [0]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsubscriptable object...
|
Our first rule will replace the NUMBER terminal within atom by the numbers non-terminal which is
defined at the end of the Grammar file.
#
Grammar
...
...
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
atom: '('
[testlist_gexp] ')' | '[' [listmaker] ']' | '{' [dictmaker] '}' | '`'
testlist1 '`' |
NAME | numbers | STRING+
...
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
...
#
# adding new grammar rules here
#
numbers: NUMBER+
|
In order to process the numbers nonterminal we have to eliminate it from the parse tree again.
Actually we have to assign meaning
to it and this happens by
transforming the numbers
non-terminal into a valid Python expression. The location for these
kinds of transformations is the FiberTransformer class in the fiber.py
file of our hex_fiber package.
Inspecting
parse trees
Before we proceed with defining transformers for our fiber lets examine
how we can inspect token
streams and parse trees to gain valuable information. The tools
presented here are essential for debugging our fiber transformer.
Without them we will likely spend hours with error seeking.
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python
fiber.py
*** Modify PyGrammar.py
file ***
___________________________________________________________________________________
hex_fiber
Running on Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
32 bit (Intel)]
___________________________________________________________________________________
hx> 1+2
3
hx> quit
___________________________________________________________________________________
|
It takes some startup time because a new PyGrammar.py file is generated
containing modified parse tables. The message *** Modify PyGrammar.py
file *** is displayed when the Grammar is processed.
It might be surprising that typing
1+2 into the console does not cause an error because we still act on an
untransformed CST. To watch the CST there are two command
line options -b and -a. The option -b stands for before and means that a CST
is displayed before
transformation. The option -a
stands for after and means
that a CST is displayed after
transformation.
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python
fiber.py -a
___________________________________________________________________________________
hex_fiber
Running on Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
32 bit (Intel)]
___________________________________________________________________________________
hx> 1+2
[cst-after>
file_input -- S`257 -- 257
stmt -- S`266 -- 266
simple_stmt -- S`267 -- 267
small_stmt -- S`268 -- 268
expr_stmt -- S`269 --
269
testlist
-- S`320 -- 320
test -- S`298 -- 298
and_test -- S`299 -- 299
not_test -- S`300 -- 300
comparison -- S`301 -- 301
expr -- S`303 -- 303
xor_expr -- S`304 -- 304
and_expr -- S`305 -- 305
shift_expr -- S`306 -- 306
arith_expr -- S`307 -- 307
term -- S`308 -- 308
factor -- S`309 -- 309
power -- S`310 -- 310
atom -- S`311 -- 311
numbers -- S`334 -- 334
<-------
NUMBER -- T`2 -- 2 L`1
1
1
PLUS -- T`14 -- 14 L`1
+
1
term -- S`308 -- 308
factor -- S`309 -- 309
power -- S`310 -- 310
atom -- S`311 -- 311
numbers -- S`334 -- 334
<-------
NUMBER -- T`2 -- 2 L`1
2
1
NEWLINE -- T`4 --
4 L`1
1
ENDMARKER -- T`0 -- 0 L`2
2
<cst-after]
3
hx>
|
The arrows <---------
indicate nodes that cannot be compiled by the Python parser. So the
compiler has to reject the CST and report a compiler error. The reason
why the expression 1+2 works nevertheless is that the console has
to unparse the CST back to
source code before it can be
evaluated on the fly. But this is just the consoles behaviour.
The same code snippet would fail as part of a
script because the script can be compiled directly from the CST without
a
preceeding CST to source transformation.
In the listing above the parse tree is somewhat pretty printed. Another
representation that is more closely related to the nature of the
underlying datastructure is that of a nested list:
[symbol.fileinput,
[symbol.stmt,[simple.stmt,[....]]]
When using the parser module of Pythons standard library we get (
almost ) the
same representation when parsing a Python expression ( or statement )
and listify it:
hx>
import parser
hx> l = parser.expr("1+2").tolist()
hx> l
[258, [320, [298, [299, [300, [301, [303, [304, [305, [306, [307, [308,
[309, [310, [311, [2, '1']]]]], [14, '+'], [308, [309, [310, [
311, [2, '2']]]]]]]]]]]]]]], [4, ''], [0, '']]
hx> pprint(l)
eval_input -- S`258 -- 258
testlist -- S`320 -- 320
test -- S`298 -- 298
and_test -- S`299 -- 299
not_test -- S`300 --
300
comparison
-- S`301 -- 301
expr -- S`303 -- 303
xor_expr -- S`304 -- 304
and_expr -- S`305 -- 305
shift_expr -- S`306 -- 306
arith_expr -- S`307 -- 307
term -- S`308 -- 308
factor -- S`309 -- 309
power -- S`310 -- 310
atom -- S`311 -- 311
NUMBER -- T`2 -- 2
1
PLUS -- T`14 -- 14
+
term -- S`308 -- 308
factor -- S`309 -- 309
power -- S`310 -- 310
atom -- S`311 -- 311
NUMBER -- T`2 -- 2
2
NEWLINE -- T`4 -- 4
ENDMARKER -- T`0 -- 0
hx>
|
Of course Python won't generate a numbers
node in the tree. It reasonably operates on its own grammar.
How to read nodes
In the CST display each nonterminal is formatted like:
which has the following general meaning:
|
<nonterminal
node name> -- S`<node id> --
<projected node id> |
The S` prefix is for Symbol.
Analog we have to read terminals like:
as
|
<terminal
node name> -- T`<node id> --
<projected node id> L`<Line number>
|
The T` prefix is for Token
and the L` prefix is for Line.
Since EasyExtend permits multiple fibers to coexist
in the same context, it assigns a unique range of node id's to the node
types of a fiber.
Therefore fiber definitions can be unambigously
transformed into Python expressions. The node id range for Python is 1..512. It contains the node id's
being recognized by the
compiler. After fiber transformation, all node id's of the
parse tree are projected onto the Python range. This is easy because
each Python node id is the restclass of division by 512. For the node
id of the stmt node we have the equation:
py_smbol.suite =
fiber_symbol.suite % 512
The only restriction this imposes on a fiber definition is a rigid
segmentation. For terminals ( token.py ) 256 node id's are reserved for
Python. The same number is available for non-terminals ( symbol.py ).
Currently around 60 terminals and 80 nonterminals are used. So there is
still lots of space left. Let me know when you need a larger node id
range.
Display the
translated Python code
Each CST can be unparsed. You might display the unparsed CST using the
option -p for Python. Restart the console and take
a look:
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python
fiber.py -p
___________________________________________________________________________________
hex_fiber
Running on Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
32 bit (Intel)]
___________________________________________________________________________________
hx> 1+2
[python-source>
1 + 2
<python-source]
3
hx>
|
We have examined the -a
option for watching the CST after
being transformed and the -p
option for watching the Python code of the transformed expression.
Additionally there are following options:
-b
|
show
CST before it gets transformed
|
-t
|
show
token sequence
|
-m
node_name[,node_name]* |
mark
nodes in CST with node(s) name given by the parameters of the -m
option. If multiple node names are used they have to be comma seperated
|
Check them out and also combine them:
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python
fiber.py -b -m
NUMBER,term -t
___________________________________________________________________________________
hex_fiber
Running on Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
32 bit (Intel)]
___________________________________________________________________________________
hx> 1 2 3
[token>
----------------------------------------------------------------------------------------.
Line | Columns | Token Value | Token Name
| Token Id | Token class | Class Id |
-------+---------+-------------+---------------+-------------+-------------+------------+
1 | 0-1 |
'1' |
NUMBER | 1538 --
2 | NUMBER | 1538 -- 2 |
1 | 2-3 |
'2' |
NUMBER | 1538 --
2 | NUMBER | 1538 -- 2 |
1 | 4-5 |
'3' |
NUMBER | 1538 --
2 | NUMBER | 1538 -- 2 |
1 | 5-6 |
\n |
NEWLINE | 1540 -- 4 |
NEWLINE | 1540 -- 4 |
2 | 0-0 |
'' |
ENDMARKER | 1536 -- 0 |
ENDMARKER | 1536 -- 0 |
----------------------------------------------------------------------------------------'
<token]
[cst-before>
file_input -- S`1793 -- 257
stmt -- S`1802 -- 266
simple_stmt -- S`1803 -- 267
small_stmt -- S`1804 -- 268
expr_stmt -- S`1805 --
269
testlist
-- S`1856 -- 320
test -- S`1834 -- 298
and_test -- S`1835 -- 299
not_test -- S`1836 -- 300
comparison -- S`1837 -- 301
expr -- S`1839 -- 303
xor_expr -- S`1840 -- 304
and_expr -- S`1841 -- 305
shift_expr -- S`1842 -- 306
arith_expr -- S`1843 -- 307
term -- S`1844 -- 308
<-------
factor -- S`1845 -- 309
power -- S`1846 -- 310
atom -- S`1847 -- 311
numbers -- S`1870 -- 334
NUMBER -- T`1538 -- 2
<-------
1
1
NUMBER -- T`1538 -- 2
<-------
2
1
NUMBER -- T`1538 -- 2
<-------
3
1
NEWLINE -- T`1540 --
4 L`1
1
ENDMARKER -- T`1536 -- 0 L`2
2
<cst-before]
Traceback (most recent call last):
File
"C:\lang\Python24\lib\site-packages\EasyExtend\eeconsole.py", line 183,
in compile_cst
_code = compile(src,"<input>","single",
PyCF_DONT_IMPLY_DEDENT)
File "<input>", line 2
1 2 3
^
SyntaxError: invalid syntax
hx>
|
Obviously a seqence of blank separated numbers 1 2 3 cannot yet be
backtransformed into a Python expression. Note however that we parsed
accurately. Note also the node ids of the hex_fiber node. These node
ids will likely differ in your own display.
The Fiber Transformer
Specify the
numbers transformation
A numbers nonterminal has always the structure
[symbol.numbers, [token.NUMBER, "1", line-number], [token.NUMBER,
"0x00", 42],...].
Here 42 is a fictitious line number and can should be ignored by all
our transformations. Note that the value
of a terminal is always represented by a string.
The transformation of numbers
is specified by those two rules:
- When a single NUMBER terminal is found and its value starts with
"0x"
it shall be replaced by a Hex object. Otherwise it shall not be
translated ( e.g. "1" shall not translated into a Hex ).
- When a blank separated sequence of NUMBER terminals of at
least length = 2 is found each terminal value
shall be translated into a Hex object independent of all prefix
considerations.
The fiber module
The transformation has to be encoded as a method in the
FiberTransformer class of the fiber.py
The canonical fiber.py layout consists of some module imports, an "if
__name__ == '__main__'" startup
procedure that either executes a module or enters a console and two
attributes named __publish__ and FiberTransformer.
from
conf import* # fiber specific modules and settings
import EasyExtend # EasyExtend package
import PyGrammar # contains fiber specific DFA tables
used by the parser
from EasyExtend.csttools import* # tools used to manipulate
CSTs
from EasyExtend.eetransformer import Transformer, transform
class FiberTransformer(Transformer):
'''
Defines fiber specific transformations
'''
__publish__ = []
if __name__ == '__main__':
import fiber
(options,args) = opt.parse_args()
fiber.options = options.__dict__
if args:
py_module = args[-1]
EasyExtend.run_module(py_module, fiber)
else:
console =
EasyExtend.create_console("hex_fiber", fiber)
console.interact()
|
In the first step we add a transformation rule to our FiberTransformer
which is a method named after the terminal or nonterminal that shall be
transformed. To mark it as a transformation rule it shall be decorated
with the @transform decorator.
class
FiberTransformer(Transformer):
'''
Defines fiber specific transformations
'''
@transform
def numbers(self, node):
'''
numbers: NUMBER+
'''
|
Find subnodes
Now we keep the sequence of NUMBER terminals and check the length. If
the length equals 1 the prefix of the
first NUMBER is inspected.
class
FiberTransformer(Transformer):
'''
Defines fiber specific transformations
'''
@transform
def numbers(self, node):
'''
numbers: NUMBER+
'''
_numbers = find_all(node,
token.NUMBER) # find_all returns all NUMBER nodes found as
# subnodes of node
if len(_numbers) == 1:
if
_numbers[0][1].startswith("0x"):
# keep a NUMBER and inspect its form
# implement case
of
0xYYYY...
else:
# implement case of decimal or octal number
else:
#
implement case of sequence of numbers
|
Searching for subnodes of a node is done with the functions find_node
and find_all.
-
find_node( |
cst,
node_id, level=10000) |
- Seeks for a node with id = node_id
in cst. It returns the first
node found in a search is depth-first search or None.
- The
maximal depth of the searched cst is constrained by level which has a high value by
default. If level = 1 only
- the immediate subnodes of cst will be inspected.
-
find_all( |
cst,
node_id, level=10000) |
- Seeks for all nodes with id
= node_id in cst. Return a list of these nodes.
If no node is found the empty list is returned. The search mode is
depth-first. The
maximal depth of the searched CST is constrained by level which has a high value by
default. If level = 1 only the immediate subnodes of cst will be
inspected.
Most of the time we need just a few search operations to keep all
important data to start or constrain a transformation.
Create new CSTs
We need to fill the implementation gaps with function calls that create
new CST fragments. Everything is based on the cst.py module
that contains functions for CST node creation. These functions are
named after the nodes they create: we have a function try_stmt to
create try-statement nodes, a test function to create test nodes etc. A
module that provides abstractions from raw CST node creation is
cstgen.py. It defines a more convenient and "intelligent" API. So you
don't usually need to create the whole node structure from the bottom
up but there are generic wrapper functions like any_test and any_stmt (
defined in csttools.py ). Functions like CST_CallFunc, CST_Add
etc. use these wrapper functions and produce the relevant nodes.
if
len(_numbers) == 1:
if
_numbers[0][1].startswith("0x"):
# keep a NUMBER and inspect its form
return CST_CallFunc("Hex",[_numbers[0]]) # returns a node
of type symbol.power.
else:
return
atom(_numbers[0])
# do not
wrap number by Hex and substitute
else:
# the
numbers nonterminal by an atom
# implement case
of sequence of
numbers
|
This same function can be applied to argument lists.
if
len(_numbers) == 1:
if
_numbers[0][1].startswith("0x"):
# keep a NUMBER and inspect its form
return CST_CallFunc("Hex",[_numbers[0]]) # returns a node
of type symbol.power.
else:
return
atom(_numbers[0])
# do not wrap number by Hex and substitute
else:
return
CST_CallFunc("Hex",_numbers)
|
The _numbers are passed as arguments to the Hex constructor.
Exercise: use option -p
to inspect the translated code. Also checkout options -b, and -a to get
a feeling for raw and transformed CSTs. Python will obviously complain
about not finding Hex. That's o.k. at this stage.
The Hex class
We implement the necessary minimum of a Hex class.
Besides __init__ and __repr__ only the methods __concat__ and
__rconcat__ used to support the "++" operator are defined. Additional
methods can be added on your will.
class
Hex(object):
def __init__(self, *args):
self.bytes = []
for arg in args:
if
isinstance(arg, (int,long)):
self.bytes.append(self.create_from_int(arg).bytes)
elif
isinstance(arg, Hex):
self.bytes.append(arg.bytes)
else:
raise TypeError,"cannot convert to Hex object: '%s'"%arg
self.bytes =
sum(self.bytes,[])
@classmethod
def create_from_int(cls, i):
assert
isinstance(i,(int,long)), "type of i is %s"%type(i)
h = Hex()
h.bytes = []
v = i
if v == 0:
h.bytes = [0]
else:
while v:
v,r = divmod(v,0x100)
h.bytes.insert(0,r)
return h
def __len__(self):
return len(self.bytes)
def __concat__(self, other):
if isinstance(other, Hex):
h =
Hex()
h.bytes = self.bytes+other.bytes
return h
else:
h =
Hex()
h.bytes = self.bytes+Hex(other).bytes
return h
def __rconcat__(self, other):
return
Hex(other).__concat__(self)
def
__repr__(self):
return "
".join(["0x%02X"%item for item in self.bytes])
|
Define a
concatenation operator "++"
The operator
'++' is used to concatenate Hex objects as arrays of bytes. This
operator must be made public to the tokenizer first.
A generic tokenizer is configured using the EEToken module in
eetoken.py. In order to define new token one must subclass EEToken in
the token.py module of the fiber. This can be found in the fibermod
directory of the hex_fiber package.
class
EEToken(object):
def __new__(cls):
# --begin of default
constants --
# Warning! Do not change
this section
cls.ENDMARKER =
(0, _, TOKENIZE_PY)
cls.NAME = (1, _, TOKENIZE_PY)
cls.NUMBER = (2, _, TOKENIZE_PY)
cls.STRING = (3, _, TOKENIZE_PY)
cls.NEWLINE = (4, _, TOKENIZE_PY)
cls.INDENT = (5, _, TOKENIZE_PY)
cls.DEDENT = (6, _, TOKENIZE_PY)
cls.LPAR = (7, "(", LBRA)
cls.RPAR = (8, ")", RBRA)
cls.LSQB = (9, "[", LBRA)
cls.RSQB = (10, "]", RBRA)
cls.COLON = (11, ":", SPECIAL)
cls.COMMA = (12, ",", SPECIAL)
cls.SEMI = (13, ";", SPECIAL)
cls.PLUS = (14, "+", OPERATOR)
cls.MINUS = (15, "-", OPERATOR)
...
...
@classmethod
def Initial(cls):
return map(re.escape,
cls.select(LBRA))
@classmethod
def Final(cls):
return map(re.escape,
cls.select(RBRA))
...
|
The token ( terminals ) are defined in EEToken as triples.
( <node id>, <token value>, <token type> )
The numeral is just the node id of the
token. In case of a token defined by Python these node_ids have to be
fixed. The second entry is the token value. If no static literal is
known an underscore _ can be used as a placeholder instead. The third
entry
is the token type. These token types are used to extract token from
EEToken and form
groups of them. For example the Initial and Final
classmethods select all initial or final parens using the LBRA and RBRA
token types. These methods are called within the gettoken function of
the eetokpattern.py module that prepares token as regular expressions
for the tokenizer
defined in eetokenizer.py.
New custom token are defined in the token.py module of our fiber which
resides in the fibermod directory. Opening the token.py file we
find following FiberToken class definition.
from
EasyExtend.eetoken import*
class FiberToken(EEToken):
def __new__(cls):
EEToken.__new__(cls)
#
# define new token here
#
return cls
|
We add our new token '++'. We have to select a name and a node id.
The node shall not be in the range of the node ids defined in
FiberTokens base class EEToken ( unless we intend to overwrite it
explicitely). The node id is arbitrery but starting with 100 is never
going to
fail. Finally we select the token type OPERATOR.
from
EasyExtend.eetoken import*
class FiberToken(EEToken):
def __new__(cls):
EEToken.__new__(cls)
cls.PLUSPLUS = (100, '++', OPERATOR)
return cls
|
Let's check now if our token definition works
tr>
:hex_fiber -t
___________________________________________________________________________________
hex_fiber
Running on Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
32 bit (Intel)]
___________________________________________________________________________________
hx> x ++ y
[token>
(1537, 'x', (1, 0), (1, 1), 'x ++ y\n')
(1587, '++', (1, 2), (1,
4), 'x ++ y\n') # yeah !!!
(1537, 'y', (1, 5), (1, 6), 'x ++ y\n')
(1540, '\n', (1, 6), (1, 7), 'x ++ y\n')
(1536, '', (2, 0), (2, 0), '')
<token]
Traceback (most recent call last):
File
"C:\lang\Python24\lib\site-packages\EasyExtend\eeconsole.py", line 218,
in try_parse
parseTree = dfa_parser.parsetok(tokenizer,
self.grammar, self.fiber.symbol.file_input)
File
"C:\lang\Python24\lib\site-packages\EasyExtend\parser\DFAParser.py",
line 292, in parsetok
raise SyntaxError("Error in line %d%s" % (lineno,
errMsg))
SyntaxError: Error in line 1, unexpected '++'
hx>
|
Again we get an error, since we need to transform the according
terminal of the '++' operator.
Recognizing
"++" in the Grammar
We reopen the Grammar file and add the '++' operator. First we have to
clarify the binding strength of ++. Obviously it
shall be weaker than arithmetic operations. It shall also be weaker
than shift and logical operations but stronger than logical and, or and not. The only place between logical
and arithmetical / shift operators is the comparison node and although
it is not obvious to subsume the concatenation operator under the
comparison node we have no other choice:
...
test: and_test ('or' and_test)* | lambdef
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (('++'
expr)* | (comp_op expr)*) # was expr (comp_op expr)*
comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not'
'in'|'is'|'is' 'not'
expr: xor_expr ('|' xor_expr)*
...
|
Exercise: use option -b
to inspect the CST of "x ++ y". Use also -m PLUSPLUS to display the PLUSPLUS
terminal.
Define operator_concat
We define the behaviour of the concatenation operator. The Hex object
is already prepared for usage by implementing the __concat__ and
__rconcat__ protocols. But we approach the meaning of "++" in a more
generic fashion. Each class can implement the __concat__ protocoll just
like it might implement __add__ or __mul__.
def operator_concat(*args):
'''
Defines behaviour of '++'.
'''
a,b = args[:2]
if a in (basestring, list, tuple):
res = a+b
else:
if hasattr(a,"__concat__"):
res
= a.__concat__(b)
elif
hasattr(b,"__rconcat__"):
res
= b.__rconcat__(a)
else:
raise TypeError, ":unsupported operand type(s) for ++: '%s' and
'%s'"%(type(a), type(b))
if args[2:]:
return
operator_concat(*(res,)+args[2:]) # recursion on sequences
return res
|
In the final step we have to implement the transformer of the
comparison node we modified in the Grammar. This time we use the macro fiber.
Macros are expressions of the
macro fiber. Macros contain special
variables called node variables written in angle brackets. A node
variable is a placeholder for CST nodes and passing a CST node into a
macro leads to its expansion in place of the node variable. The macro
fiber will be imported like any other module:
import
EasyExtend.fibers.macro.fiber as macro_fiber
|
The macro fiber defines the class macro which implements the expand
method.
class
FiberTransformer(Transformer):
'''
Defines fiber specific transformations
'''
...
@transform
def comparison(self, node):
if find_node(node,
token.PLUSPLUS, level = 1):
_expressions = find_all(node,
symbol.expr, level = 1)
mc = macro_fiber.macro("operator_concat(*<expr>)")
return
mc.expand({"expr": _expressions},
returns = symbol.comparison)
|
The target string "operator_concat(*<expr>)"
contains the node variable <expr>.
Since the interface of operator_concat
expects a variable argument list and <expr>
is expected to be a list of nodes ( the content of _expressions ), the
node variable is declared as a stared arg *<expr>.
Nodes
are passed as a dictionary into the expand method. The node variables
are represented by the keys of the dictionary while the values shall
substitute the node vars. We have to care for the node type of the
return value. A macro can be executed in any context and returns a top
level node, mostly symbol.file_input. Substituting a node like
comparison by file_input may have fatal consequences because it
substitutes too much, namely the whole surrounding program. So we have
to extract the comparison node from the file_input node and return it.
This is done by declaring the value of the keyword returns to
be symbol.comparison.
Exercise: try to use CST_CallFunc instead of the macro
class. Checkout other symbols to be returned like symbol.test.
The __publish__
attribute
Whenever a list of numbers or a hexadecimal number
is typed into a hex_fiber script it will be immediately converted into
a Hex object and whenever a ++ operator is applied it will be converted
into an operator_concat function call. However neither Hex nor the
operator_concat function are globally visible unless imported
explicitely. A
rather simple means for ensuring global visibility is to put Hex and
operator_concat into
__builtins__. To simplify this even more each fiber has a __publish__
attribute that keeps a list of names. In case of the hex_fiber we have:
__publish__ = ["Hex", "operator_concat"]
hex_fiber modules
We have completed our fiber definition. Now
we want to write scripts using the specific hex_fiber syntax and
execute them as fiber modules.
#
module demo.py ( version 1 )
# ------------------------------
def add_padding(arg, pad = 0x00, size = 8, right_pad = True):
res = arg
for i in range(size-len(arg)%size):
if right_pad:
res = res
++ pad # right padding
else:
res = pad
++ res # left padding
return res
if __name__ == '__main__':
print add_padding( 0x01 0x02 0x03 )
print add_padding( 0x01 0x02 0x03 , pad = 0x80)
print add_padding( 0x01 0x02 0x03 , pad = 0x80,
right_pad = False)
|
The function add_padding adds leading or trailing bytes to a Hex
object. Now we call demo.py from fiber.py.
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python
fiber.py demo.py
0x01 0x02 0x03 0x00 0x00 0x00 0x00 0x00
0x01 0x02 0x03 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80
0x80 0x80 0x80 0x80 0x80 0x01 0x02 0x03 |
There is something remarkable about the result, which might not be too
obvious on the first sight.
The if
__name__ == '__main__' :... section of demo.py
is called although demo is not __main__
in this case but fiber! The
answer can be found when we look at the translated code:
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python
fiber.py demo.py -p
[python-source>
# module demo.py
# --------------
def add_padding(arg, pad = Hex( 0x00 ) , size = 8, right_pad = True):
res = arg
for i in range(size-len(arg)%size):
if right_pad:
res
=operator_concat(( res) ,( pad ) ,)
else:
res
=operator_concat(( pad) ,( res ) ,)
return res
def __like_main__ ( ):
print add_padding( Hex( 0x01, 0x02, 0x03 ))
print add_padding( Hex( 0x01, 0x02, 0x03 ) , size =
11, pad = Hex( 0x80 ))
print add_padding( Hex( 0x01, 0x02, 0x03 ) , pad =
Hex( 0x80 ) , right_pad = False )
if __name__ == '__main__':
__like_main__ ( ) ;
<python-source]
0x01 0x02 0x03 0x00 0x00 0x00 0x00 0x00
0x01 0x02 0x03 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80
0x80 0x80 0x80 0x80 0x80 0x01 0x02 0x03
|
During code transformation the __like_main__()
function has been created. The idea behind __like_main__ is that it can
be called from the fiber and executes all statements in the function
body that are otherwise defined in the if __name__ ==
'__main__':
section of the module.
Cooperation
with standard Python
The __main__ section is still present for the purpose of demo being
called directly from python. This shall be possible at least for the
compiled version since demo.pyc contains the translated code.
Unfortunately the module is not self contained yet. When we execute
demo.py from fiber.py all fiber definitions are present, in particular
Hex and operator_concat. Not so here. We need to import some names and
initialize the fiber. Our solution is the import __fiber__ declaration.
#
module demo.py ( version 2 )
# ------------------------------
import __fiber__
def add_padding(arg, pad = 0x00, size = 8, right_pad = True):
res = arg
for i in range(size-len(arg)%size):
if right_pad:
res = res
++ pad # right padding
else:
res = pad
++ res # left padding
return res
if __name__ == '__main__':
print add_padding( 0x01 0x02 0x03 )
print add_padding( 0x01 0x02 0x03 , pad = 0x80)
print add_padding( 0x01 0x02 0x03 , pad = 0x80,
right_pad = False)
|
The import __fiber__ statement will be expanded by the transformer into:
import EasyExtend.eecommon as eecommon; __fiber__ =
eecommon.import_and_init_fiber( "hex_fiber" );
|
Now all usefull names are also present in demo.py when it is called as
a standalone module from Python or imported by another Python ( not
hex_fiber! ) module. Just take care that you call demo.pyc and not
demo.py with Python.
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python demo.pyc
0x01 0x02 0x03 0x00 0x00 0x00 0x00 0x00
0x01 0x02 0x03 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80
0x80 0x80 0x80 0x80 0x80 0x01 0x02 0x03 |
In a second demonstration we create the demo2.py module that uses
definitions of demo.py:
#
module demo2.py ( version )
# ------------------------------
import demo
print demo.add_padding(Hex(0x01, 0x02, 0x03), pad = 0x7)
|
Note the name Hex is automatically available because it is made
__builtin__ by the hex_fiber being initialized in demo.py
c:\lang\Python24\Lib\site-packages\EasyExtend\fibers\hex_fiber>python demo2.py
0x01 0x02 0x03 0x07 0x07 0x07 0x07 0x07
|
|