09

Ebola: the first test and return to square one

avr

This post is a translation of “Ebola : premiers essais et retour à la case départ

It’s been a little while now that I’ve tested my idea of ​​a language.
I just had not time to speak about it…

Finally, after mature reflection, my language is passed in the trash in favor of a assembler editor.

Ebola would be a language for writing assembler code a bit more readable. So I did some tests, and on this point, the result is quite conclusive.
But the need to use a command (gea) to convert my Ebola code into gcc assembler code seems a little too restrictive and complicated …
Hence the idea of doing a ​​direct conversion in assembler code when saving the file. We can not therefore speak of language but rather a tool, like dreamweaver will for the web.

I took the opportunity to improve the syntax in order to enrich it, now that I see more clearly.

Here is a point of my thinking about the syntax.

Directives

.trace		1
.output		gcc
.optimze		0
.strict		1
.stack		1

There is only 5 directives for the moment.

  • trace : enables or not the usage of “call” opérations. Default value for “trace” is 0.
  • output : Sets the output format. At first (and probably for long) will only run for gcc. Gcc is the default value.
  • optimize : I plan one day to optimize the code produced by reordering the instrution to minimize bubbles in the pipeline. Default value for “optimize” is 0.
  • strict: prohibits the use of the opcode B (branch). This implies that all loops must use structures provided by Ebola. Default value for “strict” is 1.
  • stack: indicates whether Ebola is the only one who can use the stack. Put 0 in “stack” prohibits the recursive functions because local variables are stored in the data segment and not in the stack. In this case, the register sp (r13) can be saved in order to be used by the function. Conversely put 1 in “stack” indicates that Ebola manage the stack. The use sp (r13) is prohibited because sp is used to access the local variables of the function. r12 is then available and can be used. Default “stack” value is 1.

Structures

struct pixel_rgb
   .byte red
   .byte green
   .byte blue
end struct

Structures are primarily used to define a relative offset address. To access to an attribute of a structure, use the operator. (dot)

	mov			r0, pixel_rgb.red				@ read the value of the offset red of the structure pixel_red.
	ldr			r0, [r7, pixel_rgb.red]		@ load the value of r7 + pixel_rgb.red

Variables

Ebola can handle 3 kind of variables.
The constants that can only be readed.
Static variables whose scope is the file. These variables retain their values ​​between calls.
Local variables whose scope is the function. They are reseted at each function call and allow recursion (if the directive .stack is 1).

const

const
	.byte			foo, 0
end const
	...
	mov		r4, foo								@ return the value of foo

constant variables are declared ahead of the function. they must be initialized.
They can be accessed directly via their name using the assembler opcode “mov”.
As they are memory datas, they can not be used directly with other assembler opcodes and must be loaded into registers before being used.

	adr		r0, "Hello Ebola !!!"
	adr		r1, "Hello Ebola !!!"

Static string are variables of type const.
Thus, it is not possible to write the memory address contained in r0 after the assignment. r0 and r1 (in the code above) contain the same value (same pointer).

data

data
	.byte			foo
	.pixel_rgb	mon_objet
end data
	...
	adr		r4, data								@ return a pointer of data
	mov		r5, r4.foo							@ read the value of foo (fast syntaxe)
	ldr		r5, [r4, data.foo]				@ read the value of foo
	adr		r6, data.foo						@ return a pointer of foo

Data contained in the “data” structure are stored in the segment .data
The data variables are called static variables because they retain their values ​​between function calls.
Typically, they are used in assembler to store temporarily datas.
No initialization is possible because the values of data are unknown when we enter the function.

stack

	stack		foo, 0
	stack		foo2
	...
	mov		r4, foo

The local variables, as we know them in C are initialized in the function body with the instruction “stack”.
If the directive .stack is 1 (or is not defined), then these data will be put on the stack.
If the directive .stack is 0, then these data will be stored in the data segment.

“stack” instruction allocated the necessary memory space and then initializes the variable with the value (if any).
Local variables of type stack initialized can potentially consuming a lot of CPU. Be sure you need this kind of variable before using them.

Reading a local variable is the same as reading a variable of type const.
In case of conflict, the local variable is read.

Functions

function swap_red_green
   use         r0, r1, r2, r8, r9
   ...
   return

The functions are declared using the keyword “function”. They end with “return”.
The “use” statement allows you to define what registers will be used by the function.
If the function does not have “use” statement then it will think that no registers should be saved.
The “use” statement is very important. It allows:

  • to save only the necessary registers
  • to properly restore the registers before exiting the function.
  • to indicate to the optimizer code (the day it will exist) what are the available registers it can use to optimize the code.

for[cc] … end for loops

	for[cc]		r4, #0, r2, #1
	...
	end for

The “for” loop statement needs 4 parameters

  • Iteration register
  • Start value (or register)
  • End value (or register)
  • Increment value (or register)

[cc] is the loopback conditional code. By default it is lt

The equivalent assembler code for the “for” loop (before optimization) is the following:

	mov			r4, #0
	cmp			r4, r2
	bge			exit_for
for:
	...
	add			r4, r4, #1
	cmp			r4, r2
	blt			for:

Rem: If the iteration register is not used inside the loop, the optimizer should be able to replace a “for” loop with a “while” loop which is faster.
Rem: If the iteration register is not used inside the loop and the number of iterations (end register – start register) is constant, the optimizer should be able to replace a “for” loop by a “loop” loop whose performance is optimal.

while[cc] … end while loops

	whilene		r4, #10
	...
	add			r4, r4, #1
	end while

The “while” loop (as in all language) is a simplified version of the “for” loop.
the variables initialization is managed by the developer before entering the “while”.
The increment is also managed by the developer within the loop.

The equivalent assembler code for the “while” loop (before optimization) is the following:

	cmp			r4, #10
	beq			exit_while
while:
	...
	add			r4, r4, #1
	cmp			r4, #10
	bne			while
exit_while:

loop … do[cc] again loops

	mov			r4, #10
	loop
	...
	subs			r4, r4, #1
	dogt			again

“loop” loops is an accurate representation of a conventional loop in assembler.
Their main advantage is that they do not generate any additional hidden code!

The equivalent assembler code for the “loop” loop (before optimization) is the following:

	mov			r4, #10
loop:
	...
	subs			r4, r4, #1
	bgt			loop

Because of their similarity with the assembler, these loops are the most effective.

if[cc] … else … endif

	ifeq			r0, #5
	...
	elsene		r1, r7
	...
	elseeq
	...
	else
	...
	endif

“if” statement replacing both the C “if” statement and the “switch” statement.
the “else” statement can be conditional and corresponds to an “else if” statement.
If there is no operand to compare then the comparison is made on the previous operands.

The equivalent assembler code for the “if” statement (before optimization) is the following:

	cmp         r0, #5
	bne         else1
	...
	b           endif
else1:
	cmp         r1, r7
	beq         else2
	...
	b           endif
else2:
	bne			else
	...
	b           endif
else:
	...
endif:

Exit from loops

	exit
	...
	cmp			r5, #1
	exiteq
	...
	exiteq			r5, #1

At any time you can exit a loop using the exit statement [cc].
In case of conditional exit, the testing operation (or comparison) can either be specified and therefore carried by Ebola or provided by the developer.

Conclusion

Ebola syntax is not intended to greatly simplify the assembler code. But it can help to structure the program and makes it more readable.
It also greatly simplifies the porting of a C source (for example).
Finally Ebola should allow in the same tool to edit assembler code and count the cycles of the program, which should allow a huge time saver since the optimization can be done at the same time as development.

Version 0.6 of the cycle counter (which I almost finished) implements a new engine that allows real time calculation (say, extremely short) the number of program cycle.

Finally everything is there in Ebola, so that in the near future I start a new hobby: the automatic optimization of assembler code. I will have the opportunity to explain the choice I made ​​on a particular specification of Ebola.

Répondre

Human control : 3 + 5 =