Static Code Analysis - Assembler

Overview

Assembler analysis is intended as an addition to C analysis primarily as support for mixed projects with limited use of assembler code (for example, a module or inline assembler code for initializing a device, and so on) rather than for purely assembler projects. For this reason, assembler analysis does not perform complete analysis of assembler code; the analysis only encompasses those parts of the code which are of interest for C analysis support.

There are two types of assembler analysis:

Empty analysis

Performed if no assembler has been selected, that is if the value <none> has been selected in assembler in the Options > Assembler / Source option. You should select the value <none> if you do not require analysis of the assembler code or when the assembler dialect of your choice is not supported. In this work mode, all inline assembler blocks are skipped according to the rules which obey the syntax of using assembler code for the selected C language dialect. Assembler files will be ignored during analysis, while warnings will be reported for inline assembler blocks ("Symbol references ignored").

Full analysis

Performs the analysis of all assembler blocks which reside in the project emulating the behavior of the original assembler selected in Options > Assembler / Source. What is meant by emulation here is macro analysis and expansion, evaluation of conditional preprocessing directives which are of interest for the analysis, as well as obtaining information on the uses of project symbols.

Full assembler analysis ensures the following functionalities:

Assembler code syntax analysis: Assembler analysis will perform syntax analysis of the assembler code and, if the syntax rules of an assembler are broken, it will report an error. It is important to note that DAC does not guarantee complete syntactic correctness of the analyzed code. As there are parts of assembler syntaxes that are not of interest for C analysis support, their syntax is not checked, which is explained in further detail in the Reference Manual.

Reporting the use of project symbols: All uses of user-defined symbols (definitions, uses of the variables, function calls) will be reported to the symbol database. This functionality ensures that the uses of symbols defined within the C code be correctly interpreted when in use in the assembler code. Yet, as the C program language has a far larger number of symbol types than an assembler and assemblers, on the other hand, have less limitation on the use of symbols, the heuristic approach has been adopted to ensure better quality information on C program language symbols used in assemblers. You can read more about this in Chapter  "Symbols"

Analysis of the entire assembler code: Assembler analysis ensures not only the analysis of in-line blocks but also the analysis of separate assembler files. DAC will perform assembler analysis of all files included in the project whose extensions correspond to the Options > Project / File Types / Assembler Source File option. In cases when the assembler code resides in C files, it will be updated if the syntax of assembler code use corresponds to the syntax of the C dialect used.

Assembler code syntax coloring: The syntax coloring of the assembler code will be in accordance with the syntax of the selected assembler. Assembler analysis ensures the information necessary for this support.

Code Syntax

There are many differences among assembler dialects, many more than among C language dialects, but some similarities are visible. All explanations will be given on the generic example of an assembler the principles of which can be applied to all assemblers. The specificities of particular assemblers are dealt with in greater detail in a separate document - Technical Note.

Typical assembler line syntax looks like this:

[label] [keyword | macro call] [list of arguments with expressions] [comment]

By entering a label, a new user symbol is introduced to the project (symbol definition is performed). Assembler analysis distinguishes between two types of labels:

a. Data labels, and

b. Code labels.

Data labels are analogous to variables and constants in C, while Code labels represent destinations of program jumps. Assembler analysis interprets Code labels as function entry points. As assembler analysis is adapted to the C way of thinking, Data labels will from here on be referred to as variables (that is, constants), and Code label as functions.

Macro Processor

Most of the supported assemblers have a built-in macro processor. The macro processor ensures the definition of user macros, recognition of macro calls, and replacement of macros by expanded macro definition body. During expansion, formal macro arguments are replaced by real call arguments, according to the rules put down by the original assembler.

Assembler analysis ensures these services for supported assembler dialects.

Macro processors can have the possibility of nesting macro definitions and the possibility of nesting macro calls, macro recursion, of generating unique labels during macro expansion, etc. All these possibilities are supported if they are supported by the selected assembler dialect. Details on exceptions to this rule are given in section  "General Limitations"

During macro definition, macros are reported to the symbol database. If the macro definition of the macro viewed resides in a file included in the project then the macro is reported as a local macro, and if its macro definition resides in a file which has been included in one or more files using the INCLUDE (or corresponding) directive, then it is reported as a global macro.

If an error has been made in the macro definition body, it will be reported every time the macro is expanded at macro call position.

Several macro processors have additional specificities which will not be discussed in this document.

Scope of Symbols

As symbols in assembler and C analysis can differ greatly, a number of rules have been introduced in order to facilitate establishing a connection between symbol uses in various parts of the code. There are no types in assemblers as in C language, that is, all symbols are translated into addresses which can later be used arbitrarily. DAC assembler analysis classifies symbols which appear in the assembler source code into the following groups:

a. Global variable

b. Global function

c. Global constant

d. Local variable

e. Local constant

f. Local function

g. Global macro

h. Local macro

i. Structured type

On the basis of definitions, declarations and the uses of particular symbols, assembler analysis determines the symbol equivalent in C language in order to improve symbol manipulation. The rules which assembler analysis introduces are as follows:

1. If a symbol is entered following an instruction word which means an unconditional jump to a subroutine, then the symbol occurrence is interpreted as a function call.

2. If a symbol is entered preceding an instruction word then the symbol occurrence is interpreted as a function definition.

3. If a symbol is entered preceding a directive which means defining a constant then the symbol occurrence is interpreted as a constant definition.

4. If a symbol is entered preceding a directive which means reserving memory space then the symbol occurrence is interpreted as a variable definition.

5. If a symbol is entered following a directive which means declaration of an external symbol and if later on, in the code flow, the same symbol appears as a function call, then the symbol occurrence is interpreted as an external function declaration.

6. If a symbol is entered following a directive which means declaration of an external symbol and if, later on, in the code flow, the same symbol does not appear as a function call, then the symbol occurrence is interpreted as an external variable declaration.

7. If a symbol is entered following a directive which means that the symbol entered is visible outside the file viewed and if, later on, in the code flow, the same symbol appears as a function definition, then the symbol occurrence is interpreted as a global function declaration.

8. If a symbol is entered following a directive which means that the symbol entered is visible outside the file viewed and if, later on, in the code flow, the same symbol appears as a variable definition, then the symbol occurrence is interpreted as a global variable declaration.

9. If a symbol appears and is interpreted as the definition of a constant, variable, or function and if it has previously appeared in the code within a global declaration, then the occurrence is interpreted as an occurrence of the symbol's global definition.

10. If a symbol appears and is interpreted as the definition of a constant, variable, or function and if it has not previously appeared in the code within a global declaration, then the occurrence is interpreted as an occurrence of the symbol's local definition.

11. If a symbol appears to the right of the instruction which signifies any type of assignment or change of value, value assignment to the symbol is in question.

12. If a symbol appears at the end of a file as a label and there are not instructions or directives following it, it is reported as a function.

13. If a symbol appears to the right of a conditional jump instruction, it is reported as a symbol use (variable or constant).

14. If a symbol is reported preceding (in some assembler dialects following) a macro-defining directive, then the symbol occurrence is reported as a macro definition. If the file being analyzed is directly included in the project, the macro is reported as local, and if the file is included in the project indirectly (using the INCLUDE directive), as a global macro.

15. In all other cases, when a symbol appears to the right of an instruction or directive, a symbol use (variable or constant) is in question.

These rules follow the way of thinking in C language. Assembler, on the contrary, have many possibilities for circumventing standard (structured) ways of code writing, for example, changing the value of a symbol defined as a constant. In such cases, assembler analysis can generate an error or warning although the original assembler would not report an error in such places. In order to solve problems of this kind, another group of rules has been introduced. These rules are also heuristic, and do not guarantee solutions in the general case.

1. If there are not PUBLIC or EXTERN symbol declarations in the source code, a symbol is reported as a local symbol, that is, as a local variable, constant, or function. If a section is in question (as no such concept exists in C), whether the symbol appears anywhere as a function call argument or jump destination is checked. If such an occurrence exists, the section is reported as a function, that is as a variable in the opposite case.

2. If there is a PUBLIC symbol declaration, a symbol is declared as a global symbol.

3. If there is an EXTERN declaration (as there is no definition then), and if there exists a function call among the symbol occurrences in the file, an external function declaration is reported, otherwise an external variable declaration is reported.

4. If both PUBLIC and EXTERN declarations for the same symbol are present in the same file, DAC reports an error.

5. When reporting other symbol uses, the following procedure is followed: if at least one occurrence on the list is of the "function call" type, or a symbol is defined as a function, then all occurrences are reported as function calls. Otherwise, if it's a read, it is reported as is; if it's a write, whether a variable is in question is checked first, and if it is it is reported as a write, otherwise, as a read.

Expressions

An expression is usually considered to be the right side of the assembler line without the comments. To be more precise, it is the list of arguments with expressions which follows the assembler dialect keyword or macro call and ends with the end of the line or the comment.

Expressions can contain:

1. Symbols, that is user-defined variables, constants, etc,

2. Binary operators ( +, -, %, *, /, ... ),

3. Unary prefix operators ( !, ~, -, ... ),

4. Unary postfix operators ( .WORD, .BYTE, ... ),

5. Numeric constants ( 123, 0x1A9, 0101b, ... ),

6. Registers, flags, a location counter ( R1, A, X, C, $, ... ),

7. Ways of addressing.

Assembler analysis ensures syntactic checks of all expressions. Syntax of specific expressions that appear with directives which do not contain project symbols, is not checked. For example, an error is reported if numeric constants do not comply with the syntax of the assembler dialect, if an invalid operator has been used, etc.

The semantic checking of expressions is not carried out. This means that assembler analysis will not generate errors and warnings which have come as a result of incorrect expression semantics. For example, semantically incorrect ways of addressing and incorrect usage of registers do not generate error messages.

Assembler analysis calculates expressions which are translated into integer constants. Such expressions are of significance in conditional preprocessing directives. Expressions depending on code translation, that is instruction size will not be calculated correctly.

General Limitations

This section sum up the general limitations, which apply to all supported assemblers. Limitations pertaining only to particular assemblers or assembler families are given in another document (Technical Note).

1. Directives which specify concatenate repetition of some part of the assembler code (REPEAT, DUP, REPT, etc) do not cause code repetition during assembler analysis. The code sequence encompassed by such a directive is analyzed only once, and corresponding symbol uses are, also only once, reported to the symbol database.

2. Addresses are not calculated. Therefore, if you use the address of a symbol in a conditional preprocessing directive, expression evaluation will not be carried out correctly. In expressions which follow the IF or WHILE directive, you should use exclusively absolute symbols defined using EQU, SET or similar directives for defining absolute symbols.

3. All ways of addressing are supported, but no difference is made between which instructions support which way of addressing.

4. All user-defined assembler symbols are case sensitive. All predefined assembler symbols (for example, program available registers) are case insensitive. Assembler options which enable case insensitivity of project symbols are not supported.

5. All length specifiers are supported, but no difference is made between which specifier corresponds to which keyword or symbol.

6. Directives pertaining to code generation, file listing, etc and which have no influence on analysis are ignored.

7. Instructions and directives for which it is known in advance that they do not use project symbols are ignored.

8. Only uses of project symbols are monitored, but the "life" of this data in chip registers is not.

9. Symbols which appear in IFDEF/IFNDEF directives are not reported as uses.

10. You can always define/declare a symbol after using it, except for macros, structure types, if they are supported by the original assembler.

11. Brackets within expressions which appear on an instruction/directive list of arguments are used on equal footing and have the same meaning. Therefore, if a way of addressing presupposes the use of "[]" brackets, and you use "()" parentheses, DAC will not report an error.

12. In the case of disagreement between the behavior of the original assembler and its documentation, DAC asm analysis follows assembler behavior.

13. Labels that appear at the end of the source file are reported as functions (code labels).

14. The length of structured variables is not calculated.

15. No difference is made between relocatable and absolute assembler code sections.

16. Floating point assembler symbols are not supported and they transform into an integer part.

17. If LOAD/STORE architecture is in question, symbols to the right of STORE type instructions are written to the database as assignments, and symbols to the right of all other instructions except jump instructions, as uses. If the architecture in question is not LOAD/STORE, symbols to the right of any instruction for moving data (of the type MOV, MOVE, and so on) are reported as uses.

18. For each instruction/directive, all operands are reported as the same type of use.

19. All members of assembler structure types are reported as int.

20. Macro parameters are not reported to the symbol database.

21. If a formal macro parameter appears surrounded by single or double quotes, the replacement of the formal parameter by real value is also carried out.

22. Dummy argument in directives for repetition of part of the assembler code DUPA, DUPC, REPT and so on, is not reported to the database.

23. Maximum allowed nesting depth for assembler macro call is 5.

24. Maximum allowed nesting depth for assembler include files is 20.


Copyright 1993-2017, RistanCASE GmbH