1.
Introduction
Bytelets are networks of
binary data encoded within a P4D data
structure. Bytelets originate from a flexible representation of BER-TLVs
which lets users easily create, modify and parse TLV (
Tag-Length-Value)
objects. Bytelets are more general than TLVs and L fields
can refer to arbitrary subfields of V which reflects many
practical applications found in technical specifications. L fields are
represented by functions that respond to changes of fields values they
are bound to. This way a simple dataflow network
emerges. This network can be extended to support more
advanced causal relationships between Bytelet fields.
All Bytelets can be serialized as binary data. Those binary data can be
parsed into Bytelets using again Bytelets as structural descriptions. A
Bytelet used as such a description is called a Bytelet Schema or just Schema. Often Bytelet Schemas and
the Bytelets created by parsing hexcode from those Schemas are
isomorphic i.e. they have the same structural properties and differ
only in field values. This is not much unlike C structs that can also
be used to cast on memory chunks to create new instances of those
structs.
Otherwise Bytelets are somewhat more low level than C-structs or ASN.1
based encodings because they are untyped
There are no strings, ints, chars, arrays, pointers etc.and there
is no metalevel description available that translates type information
into sizes of memory chunks.
2. Using
Bytelets
Bytelets are part of P4D and you need to install P4D
to run the follwoing code.
Bytelets are represented by ordinary P4D elements. Using the bl
prefix of the namespace
prefix: bl
uri: http://fiber-space.de/byteletns
|
informs the P4D langlet to create a Bytelet object instead of a P4D
object.
The first example shows how to create a simple Bytelet.
p4d>
elm bl:A:
.... Tag: 0x01
.... Len: &LEN
.... Val: 0x77AA
....
p4d> A
Bytelet<bl:A>
p4d> A.hex()
0x01 0x02 0x77 0xAA
|
The Bytelet class is derived from P4D. This means all Bytelets are also
P4D objects. The Bytelet method hex() is new and Bytelet specific. It
enables serialization of Bytelets as hexacdecimal numbers. The LEN
object we assigned to the Len field as a Python object is one of two
builtin Flow objects. The other one is called VAL. We will treat them
in detail in section 2.2.
2.1
Bytelets and Hex objects
The content of P4D objects are all strings. We assign either strings or
numbers and numbers will be converted into strings. Bytelets are
somewhat more featureful in this hindsight. We can also assign Flow
objects like LEN as seen in the example above but also hexadecimal
literals which have a special status.
In P4D 1.2 we have broken Pythons semantics for hexadecimal numbers.
When you type a hexadecimal number in a Python console its base is
automatically converted into that of a decimal number:
So hexadecimal numbers don't have a proper status but their base
changes automatically. In P4D 1.2 hex literals will not be mapped onto
Python ints but on Hex objects that are defined in EasyExtend.util.hexobject.
Hex objects are used to store, manipulate and represent hexcode.
p4d>
0x0E 0xA0
0x0E 0xA0
p4d> type(0x0E 0xA0)
<class 'EasyExtend.util.hexobject.Hex'>
|
Hex objects have plenty of conversion functions and representations:
p4d>
Hex("89 07 0A")
0x89 0x07 0x0A
p4d> Hex.format =
Hex.F_STD # changing the
representation format for Hex objects
p4d> Hex("89 07 0A")
89 07 0A
p4d> Hex.format =
Hex.F_0x # changing the
representation format for Hex objects again
p4d>
Hex(0x89070A)
# another constructor call
0x89 0x07 0x0A
p4d> Hex("89 {Hello Hex} 00") # curly
braces are used to inline ASCII - strings
0x89 0x48 0x65 0x6C 0x6C 0x6F 0x20 0x48 0x65 0x78 0x00
p4d> Hex("89 {escape\{ and
\}}") # inlined curlies
0x89 0x65 0x73 0x63 0x61 0x70 0x65 0x7B 0x20 0x61 0x6E 0x64 0x20 0x7D
p4d> Hex.from_decimal("16")
# string conversion from decimals
obviously differs from ...
0x10
p4d> Hex("16")
# ... interpretation as Hex
0x16
p4d> Hex("{Hex Hex}0").binary() # sort of
0-terminated string
'Hex Hex\x00'
|
Of course we have plenty of operations we can apply immediately to Hex
object literals
p4d>
h = 0x0E 0xA0
p4d> h //
0x00
# concatenation
0x0E 0xA0 0x00
p4d> 0x0E + 0x03
# addition
0x11
p4d> p4d> 0x0E 0x89 0x89 & 0x78
0x75
0x56 #
bitwise and
0x08 0x01 0x00
p4d> p4d> 0x0E 0x89 0x89 ^ 0x78 0x75
0x56 #
xor
0x76 0xFC 0xDF
|
For a more comprehensive overview you have to look at tha API
description of the Hex object.
2.2 Flow objects
2.2.1 Anatomy
of a Flow object
The Flow object is the key feature of Bytelets. One can understand a
Flow object roughly by considering three properties. A Flow object has
1) an internal state which is expressed by the _value
attribute. The _value
attribute can be accessed using the flow_value()
method. The type of _value is
Hex. On initialization _value is
None.
2) a reference to the containing Bytelet. A Flow object is usually the CONTENT
of a Bytelet having the following simple
structure [TAG, {}, [],
CONTENT]. We aren't so much interested in this
containing Bytelet though but in its parent:
[TAG, {}, [[TAG, {}, [],
CONTENT]...], ''].
The containing Bytelet and the parent of the containing Bytelet are
represented by the Flow attributes _node
and _parent.
3) an update()
method. The update()
method is either called when a _value
attribute is demanded and is _value None
or it is called when a Bytelet changes e.g. when a
field gets updated using __setattr__. The update() method implements
the
behaviour of the Flow object. It is the distinctive
property of flow objects and shall be overridden in subclasses of Flow.
Coming back to our initial example.
p4d>
elm bl:A:
.... Tag:0x01
.... Len:&LEN
.... Val:0x77
|
The LEN object is a Flow object assigned to Len as a Python object.
Actually LEN is of type FlowLen and it is a global or builtin Flow
object. On assignment to Len it will be copied. So you can use it in
arbitrary many places without danger of sharing its value. Initially
the attribute values are all None.
p4d>
import pprint
p4d> pprint.pprint(A._tree)
['bl:A',
{},
[['Tag', {}, [], 0x01],
['Len',
{},
[],
<EasyExtend.langlets.p4d.bytelet.FlowLen object at
0x014E40B0>],
['Val', {}, [], 0x77]],
'']
p4d>
p4d> flow = p4d> A._tree[2][1][-1]
p4d> flow
p4d> assert flow._value is None
p4d> assert flow._node is None
p4d> assert flow._parent is None
|
This changes when we try to compute the hex value of Len:
p4d>
A.Len.hex()
0x01
p4d> flow._value
0x01
p4d> flow._node
Bytelet<Len>
p4d> flow._parent
Bytelet<bl:A>
|
2.2.2 How LEN works
The update()
method of LEN locates _node
with _parent
and computes the sum of the lengths of all subsequent fields So it is
roughly:
children = _parent.children()
sum( len(field.hex()) for field in
children[children.index(_node)+1:]))
Notice that LEN._value is BER encoded:
If
len(_value) > 0x80 then the first byte of the _value encodes the
number of subsequent bytes used to store the length value.
The
most significant bit is always set in this case ( most significant
= left most bit in the bitmap
representation of a
number where the powers of 2 are ordered from left to right).
Examples:
len(_value)
|
BER encoding
|
0x78
|
0x78
|
0xA2
|
0x81 0xA2
|
0x12F
|
0x82 0x01 0x2F
|
With this scheme one can encode byte streams up to a length of 2120.
This shall be sufficient for most practical purposes.
2.2.3 LEN
and RAWLEN
Sometimes a BER encoded length field is inappropriate and we just want
to count the raw number of bytes within fixed bounds. In those cases we
use RAWLEN.
Example: for a length
field having a fixed length of 2 we have following encodings:
len(_value)
|
LEN
|
RAWLEN |
0x78
|
0x78
|
0x00 0x78
|
0xA2
|
0x81 0xA2
|
0x00 0xA2
|
0x12F
|
-
|
0x01 0x2F
|
2.2.4 LEN
Expressions
Computing the length of all fields subsequent to the Bytelet containing
LEN is the most common application of Flow objects in Bytelets. But
sometimes one might want to compute just the length of a particular
field or the length of two adjecent fields, one with a fixed size and
one with a variable size.
All those cases one can new Flow objects from LEN by applying simple
arithmetic operations or subscripting.
p4d>
elm bl:A:
.... Tag:0x01
.... Len:&LEN
.... DCS:0x04
.... Text: "{some bytelet text}"
p4d>
p4d> A.Len.hex()
0x12
|
This is what we already know. But we can extend it:
p4d>
A.Len = LEN["Text"]
p4d> A.Len.hex()
0x11
|
or
p4d>
A.Len = LEN["Text"]+1
p4d> A.Len.hex()
0x12
|
Supported operations:
operator
|
operand type
|
+
| - | * | / |
int
|
[]
|
string
|
Same operations are valid for RAWLEN.
2.3
Bytelet Schemas
Bytelet Schemas are Bytelets used to parse hexcode into other Bytelets.
Often Bytelet Schemas are isomorphic
with target Bytelets i.e. they have the same structure but different
values. This isn't very much different from using a structure type in C
for casting on an array and produce a new C struct. There are different
types of Schemas but two of them are the most relevant. The first is
called Standard Schema and
the second one TLVList Schema.
These shall be treated in more detail now.
2.3.1 The
Standard Schema
A Schema for the Bytelet
elm
bl:A:
Tag:0x01
Len:&LEN
DCS:0x04
Text: "{some bytelet text}"
|
looks like this
elm
bl-schema:AS:
Tag: 1
Len: &LEN
DCS: 1
Text: &VAL -1
|
Now we can use the AS Schema and parse a Hex object.
p4d>
R = AS.parse( A.hex() )
p4d> R
Bytelet<bl:AS>
p4d> R.DCS.hex()
0x04
p4d> R.Text.hex().ascii()
"some bytelet text"
|
The Standard Schema defines a new namespace defined by
prefix: bl-schema
uri: http://fiber-space.de/byteletschemans
|
It corresponds with a new class ByteletSchema which is derived from
Bytelet and implements the single public method
2.3.1.1
How VAL
works
The VAL
object is FlowFieldVal
instance. It is complementary to LEN
and it is used in Schemas only. The functionality is this: VAL
refers to a LEN
field and when the _value of the LEN
field is k, k bytes will be read from the hexcode stream and assigned
to Text._value.
In the last example the value read from the A.hex()
for LEN
was 0x12.
Therefore VAL-1
was 0x11
and as much bytes were read from A.hex().
As for LEN
which might refer to a particular field F
via LEN["F"]
a reference can be used with VAL:
VAL["F"]
reads F.hex()
bytes.
2.3.2 The
tlvlist
Schema
A ByteletSchema element specifies an attribute use.
If use
is omitted than the value std
is implied. Another value is tlvlist. A
tlvlist
value changes the way a hexcode object is parsed.
When a tlvlist Schema is used it is assumed that a TLV object has a
tree structure with TLV objects specified by sub-TLVs and their tags.
Example 1:
elm
bl-schema:A(use="tlvlist"):
TLV(tag = 0x00):
SubTLV1(tag = 0x01)
SubTLV2(tag =
0x02):
SubSubTLV1(tag = 0x00)
SubSubTLV2(tag = 0x01)
SubTLV3(tag =
0x03)
|
There are some default assumptions being made about tlvlist Schemas
when parsing hexcode. Those are
- A TLV object is uniquely specified by its tag within a TLV
structure. So the sequence of occurence in the tlvlist Schema is
arbitrary.The schema specified above is the same as
|
elm
bl-schema:A(use="tlvlist"):
TLV(tag = 0x00):
SubTLV2(tag
= 0x02):
SubSubTLV1(tag = 0x00)
SubSubTLV2(tag = 0x01)
SubTLV1(tag = 0x01)
SubTLV3(tag = 0x03)
|
- A TLV object may occur arbitrary many times within a
hexcode object. You can interpret the presence of an a sub-TLV
specifier as: zero or more occurences in the target.
- If some unknown tag is found it will not be ignored but
some Bytelet with tag UNDEF is created.
You see that the parsing rules are rather liberal. There isn't much
checked by the parser except for consistency with a nested TLV
structure. This keeps the Schema simple and delegates more
responsibilities to Python libraries.
Notice further that definitions where a tag-name occurs more than once
are permitted:
elm
bl-schema:B(use="tlvlist"):
TLV(tag = 0x00):
Sub(tag = 0x01)
Sub(tag
= 0x02)
|
2.3.2.1
Mixing the Schema modes
You can flip the schema mode as shown below
elm
bl-schema:C(use="tlvlist"):
TLV(tag = 0x00):
SubTLV1(tag = 0x01)
SubTLV2(tag = 0x02
use = "std"):
Tag: 0x02
Len: &LEN
DCS: 1
TextLen: 1
Text:
&VAL
SubTLV3(tag = 0x03)
|
It is also possible to refer to another Schema this way and assamble a
Schema:
elm
bl-schema:TextSchema:
Tag: 0x02
Len: &LEN
DCS: 1
TextLen: 1
Text: &VAL
elm
bl-schema:D(use="tlvlist"):
TLV(tag = 0x00):
SubTLV1(tag = 0x01)
SubTLV2(tag = 0x02
use = "std"):
&TextSchema
SubTLV3(tag = 0x03)
|
3. Playing with
Bits
In section 2 we have seen how Bytelets can be used to specify
structures and TLV objects on byte arrays. Now we move to individual
bits and bit arrays. Specifying bitarrays requires a little more
notation.
3.1 Bin objects
Unlike Hex objects which encapsulate operations on arrays of bytes a
Bin object is just a subclass of int with some notational convention
that are used to specify bit arrays.
>>>
0b89 # this creates a Bin object
0b89
|
With Bin objects we can specify ByteletSchemas on a finer level:
elm
bl-schema:OneByte:
F1: 0b1
F2: 0b3
F3: 0b4
|
With the OneByte Schema we parse a single byte:
>>>
btl = OneByte.parse(0x85)
>>> btl.F1.hex()
0x01
>>> btl.F2.hex()
0x00
>>> btl.F5.hex()
0x05
>>> btl.hex()
0x85
|
In order to fit the three field values 0x01, 0x00 and 0x05 in the hex
method of btl together in a precise way ( i.e. other than 0x01 0x00
0x05 ) the Bytelet object has to store the bitarray characteristics in
the fields. This is done using the w(idth)
attribute:
>>>
btl.F1.@w
0b1
>>> btl.F2.@w
0b3
>>> btl.F3.@w
0b4
|
Of course one can start with a Bytelet which predefined widths:
elm
bl:btl:
Tag(w=0b4): 0x07
Len(w=0b16): &RAWLEN
DCS(w=0b4): 0x04
Text: "{Some Text}"
|
We get
>>>
btl.hex()
0x70 0x00 0xA4 0x53 0x6F 0x6D 0x65 0x20 0x54 0x65 0x78 0x74
|
Note that using RAWLEN instead of LEN has no meaning in this example.
It is just the recommended way to assign length fields in cases of
fixed sized length fields.
3.2 Sort of
runtime type checks
Currently there are no checks when you assign values to fields and P4D
won't complain when you set:
However evaluation of hex() will check whether the assigned Hex object
values are within the bounds of the specified bit arrays. Otherwise as
TypeError exception is raised:
>>>
btl.Tag = 0x100
>>> btl.hex()
Traceback (most recent call last):
...
TypeError: Hex object 0x100 does not fit into a 4 bit array.
|
4. History
Date
|
Version
|
Change
|
2008-08-08 |
1.2
|
- One can now set attribute values and content easily
without referring to the internal datastructure.
- Introduction of Bytelets and the bl namespace which is supported on
language level.
- Introduction of ByteletSchema objects and the bl-schema namespace which is
supported on
language level.
- Adding a Hex object context.
- elm foo:bar:
... works like an assignment bar = foo:bar:...
which is valid for all non keywords and non hyphened
names.
|
2008-07-11 |
1.1.1
|
- Fix: remove py4as3 import from xmlutils and p4dbase
|
2008-07-11 |
1.1 |
- Fix XML formatAttributes() in xmlutils.
- Make locals() available for mapeval.
- Add flexutils module
- Add elm keyword
- Add P4D comments
- Fix double-colon access for namespaces with p4d
prefix.
- Add element lists or tuples.
- Add P4D.eval()
- Add CDATA handling
- Change filter semantics
|
2008-06-26
|
1.0
|
Initial release
|
|