Introduction
The decompiler attempts to translate from low-level representations of computer programs into high-level representations. Thus it needs to model concepts from both the low-level machine hardware domain and from the high-level software programming domain.
Understanding the classes within the source code that implement these models provides the quickest inroad into obtaining an overall understanding of the code.
We list all these fundamental classes here, loosely grouped as follows. There is one set of classes that describe the Syntax Trees, which are built up from the original p-code, and transformed during the decompiler's simplification process. The Translation classes do the actual building of the syntax trees from binary executables, and the Transformation classes do the actual work of transforming the syntax trees. Finally there is the High-level classes, which for the decompiler represents recovered information, describing familiar software development concepts, like datatypes, prototypes, symbols, variables, etc.
Syntax Trees
- AddrSpace
- A place within the reverse engineering model where data can be stored. The typical address spaces are ram, modeling the main databus of a processor, and register, modeling a processor's on board registers. Data is stored a byte at a time at offsets within the AddrSpace.
- Address
- An AddrSpace and an offset within the space forms the Address of the byte at that offset.
- Varnode
- A contiguous set of bytes, given by an Address and a size, encoding a single value in the model. In terms of SSA syntax tree, a Varnode is also a node in the tree.
- SeqNum
- A sequence number that extends Address for distinguishing PcodeOps describing a single instruction.
- Overview of SeqNum
- PcodeOp
- A single p-code operation. A single machine instruction is translated into (possibly several) operations in this Register Transfer Language.
- Overview of PcodeOp
- BlockBasic
- A maximal sequence of p-code operations that always executes from the first PcodeOp to the last.
- Overview of BlockBasic
- Funcdata
- The root object holding all information about a function, including: the p-code syntax tree, prototype, and local symbol information.
- Overview of Funcdata
Translation
Transformation
High-level Representation
Overview of SeqNum
A sequence number is a form of extended address for multiple p-code operations that may be associated with the same address. There is a normal Address field. There is a time field which is a static value, determined when an operation is created, that guarantees the uniqueness of the SeqNum. There is also an order field which preserves order information about operations within a basic block. This value may change if the syntax tree is manipulated.
Address & getAddr();
uintm getTime();
uintm getOrder();
Overview of PcodeOp
A single operation in the p-code language. It has, at most, one Varnode output, and some number of Varnode inputs. The inputs are operated on depending on the opcode of the instruction, producing the output.
Address & getAddr();
SeqNum & getSeqNum();
int4 numInput();
Varnode * getOut();
Varnode * getIn(int4 i);
BlockBasic * getParent();
bool isDead();
bool isCall();
bool isBranch();
bool isBoolOutput();
Overview of BlockBasic
A sequence of PcodeOps with a single path of execution.
int4 sizeOut();
int4 sizeIn();
BlockBasic *getIn(int4 i)
BlockBasic *getOut(int4 i)
SeqNum & getStart();
SeqNum & getStop();
BlockBasic *getImmedDom();
iterator beginOp();
iterator endOp();
Overview of Funcdata
This is a container for the sytax tree associated with a single function and all other function specific data. It has an associated start address, function prototype, and local scope.
string & getName();
Address & getAddress();
int4 numCalls();
FuncCallSpecs *getCallSpecs(int4 i);
BlockGraph & getBasicBlocks();
iterator beginLoc(Address &);
iterator beginLoc(int4,Address &);
iterator beginLoc(int4,Address &,Address &,uintm);
iterator beginDef(uint4,Address &);
LoadImage
Action
Rule
Translate
Decodes machine instructions and can produce p-code.
int4 oneInstruction(PcodeEmit &,Address &) const;
void printAssembly(ostream &,int4,Address &) const;
Datatype
Many objects have an associated Datatype, including Varnodes, Symbols, and FuncProtos. A Datatype is built to resemble the type systems of common high-level languages like C or Java.
string & getName();
int4 getSize();
There are base types (in varying sizes) as returned by getMetatype.
Then these can be used to build compound types, with pointer, array, and structure qualifiers.
class TypePointer : public Datatype {
Datatype *getBase();
};
class TypeArray : public Datatype {
Datatype *getBase();
};
class TypeStruct : public Datatype {
TypeField *getField(int4,int4,int4 *);
};
TypeFactory
This is a container for Datatypes.
Datatype *findByName(string &);
Datatype *getTypeVoid();
Datatype *getTypeChar();
Datatype *getBase(int4 size,type_metatype);
Datatype *getTypePointer(int4,Datatype *,uint4);
Datatype *getTypeArray(int4,Datatype *);
HighVariable
A single high-level variable can move in and out of various memory locations and registers during the course of its lifetime. A HighVariable encapsulates this concept. It is a collection of (low-level) Varnodes, all of which are used to store data for one high-level variable.
int4 numInstances();
Varnode * getInstance(int4);
Datatype * getType();
Symbol * getSymbol();
FuncProto
FuncCallSpecs
Symbol
A particular symbol used for describing memory in the model. This behaves like a normal (high-level language) symbol. It lives in a scope, has a name, and has a Datatype.
string & getName();
Datatype * getType();
Scope * getScope();
SymbolEntry * getFirstWholeMap();
SymbolEntry
This associates a memory location with a particular symbol, i.e. it maps the symbol to memory. Its, in theory, possible to have more than one SymbolEntry associated with a Symbol.
Address & getAddr();
int4 getSize();
Symbol * getSymbol();
RangeList & getUseLimit();
Scope
This is a container for symbols.
SymbolEntry *findAddr(Address &,Address &);
SymbolEntry *findContainer(Address &,int4,Address &);
Funcdata * findFunction(Address &);
Symbol * findByName(string &);
SymbolEntry *queryByAddr(Address &,Address &);
SymbolEntry *queryContainer(Address &,int4,Address &);
Funcdata * queryFunction(Address &);
Scope * discoverScope(Address &,int4,Address &);
string & getName();
Scope * getParent();
Database
This is the container for Scopes.
Scope *getGlobalScope();
Scope *resolveScope(string &,Scope *);
Architecture
This is the repository for all information about a particular processor and executable. It holds the symbol table, the processor translator, the load image, the type database, and the transform engine.
class Architecture {
Database * symboltab;
Translate * translate;
LoadImage * loader;
ActionDatabase allacts;
TypeFactory * types;
};