Control code obfuscation is intended to prevent malicious reverse engineering of software by masking the program control flow. The idea for further advancing the state of the art was presented in 2000 by WANG C. An obfuscating system for Java based on the ideas of WANG C is implemented and experimented. The experiment results show that obfuscation can be done efficiently with moderate increases in code size, execution times, while making the obfuscated code resilient to a variety of reverse engineering attacks.
1. Introduction
In distributed computing surroundings, a software may be deployed on several computers which can not be trusted. In malicious host surroundings, a software can be reverse engineered and also can be tampered with. The confidentiality, integrality and availability of software are threatened, so convenient and effective methods are needed to protect the intellectual property of software. Code obfuscation transformation emerges as a defense technique to protect software from reverse engineering. The advent and widely use of Java language brings the widely research on code obfuscation because the class file is easy to be decompiled.
The goal of code obfuscation is to make it difficult for an attacker from understanding the inner workings of a program by making the obfuscated programs too difficult to understand, that is, by making the task of reverse engineering the program too expensive in terms of the resources or time required.
The obfuscating transformations can be classified according to the kind of information they target [1]. Layout transformations remove source code formatting and scramble identifiers. Control transformations affect the control flow of the program, while data transformations operate on the data structures used in the program [2]. The aim of control obfuscators is to obscure the code by affecting its control flow, in order to make it more difficult to understand for a malicious user (attacker).This paper is concerned primarily with control transformations and concretely considers the approach taken form WANG Chen–xi's dissertation [3]. This choice is motived by two factors. First based on our experiments, it seems more difficult to break than those we had considered earlier. Second, it is a key component of an industrial obfuscation tool by Cloakware Inc [4].
We have implemented and experimented with an obfuscation system for Java which relies on the control flow flattening transformation. Our experiments show that obfuscation can be done efficiently with moderate increases in code size, execution times, while making the obfuscated code resilient to a variety of reverse engineering attacks.
2. Implementation and experimental results
2.1 Control flow flattening algorithm
Control flow flattening aims to obscure the control flow logic of a program by «flattening» the control flow graph so that all basic blocks appear to have the same set of predecessors and successors. The actual control flow during execution is guided by a dispatcher variable. At runtime, each basic block assigns to this dispatcher variable a value indicating which next basic block should be executed next. A switch block then uses the dispatcher variable to jump indirectly, through a jump table, to the intended control flow successor. As an example, considering the program shown in Figure 1.
Figure 1: An example program and its control flow graph
Basic control flow flattening of this program results in the control flow graph shown in Figure 2, where S is the switch block and x the dispatcher variable.1 The initial assignment to the dispatcher variable x in the block Init is intended to route control to A, the original entry block of f (), when control first enters the function; after this, control flow is guided by assignments to x in the various basic blocks.
Figure 2: Control flow graph after flattening
2.2 Byte code obfuscation
We choose the Java class files (those ending with .class) as objects being obfuscated. Since Java is a platform–independent language, source code is compiled into byte code, which stored in a .class file, and Java byte code is the form of instructions that the Java virtual machine executes. It is possible to create class files which no Java compiler can produce, and yet, which pass the Java Verifier with flying colors. Such class files are said to be deviant. The goal of byte code obfuscation is to produce deviant class files, such that the byte code file gets stronger protection against the reverse engineering. Here we use the BCEL to analyze, create, and manipulate (binary) Java class files [5].
Classes are represented by objects which contain all the symbolic information of the given class: methods, fields and byte code instructions, in particular. Such objects can be read from an existing file, transformed by a program (e.g. a class loader in run–time) and dumped to a file again.
To do the byte code obfuscation, we should know the format of class files and the byte code instruction set (which are described in more detail in the Java Virtual Machine Specification). Especially, we will deal with the security constraints that the Java Virtual Machine has to check at run–time, i.e. the byte code verifier.
When we manipulate Java class files, the code must meet the following conditions so that the JVM can accept it [6].
- Type correctness: the arguments of an instruction are always of the types expected by the instruction.
- No stack overflow or underflow: an instruction never pops an argument off an empty stack, nor pushes a result onto a full stack.
- Code constraints: the program counter must always point within the code for the method, to the beginning of a valid instruction encoding.
- Register initialization: a load from a register must always follow at least one store in this register.
- Object initialization: when an instance of a class C is created, one of the initialization method for class C must be invoked before the class instance can be used.
Having learned the constraints from JVM, we can do the obfuscation described in 2.1. Control flow flattening is performed in two steps. First, Partition basic blocks; Second, Flatten the control flow. The details are discussed in 2.3 and 2.4.
2.3 Partition basic blocks
The standard algorithm for grouping a sequence of instructions into basic blocks relies on identifying which instructions are leaders [7]. An instruction is a leader if:
- It is the first instruction in the sequence, or;
- It is the destination of a branch instruction, or;
- It follows a branch instruction.
The instructions are partitioned into basic blocks that start with a leader and contain no other leaders. All of the instructions which follow this leader will be in a single basic block. A basic block includes all of the instructions up to the instruction before the next leader. But this method of partitioning basic blocks will not work when facing the java class file.
The modified code may be rejected by the Java verifier when the code violate the constraint, all the execution paths which lead to a particular point must have operand stacks with the same number and type of items on them. So, we use the modified basic block algorithm [8].
FindLeaders: This process returns a list of the leaders in a method code. A leader is the first instruction in a basic block. For all possible execution paths through a method's code, we keep track of the operand stack size after each instruction is executed. Supposed that after the current instruction executed, the operand stack is empty. Then any instructions that would be executed after the current one are leaders.
PartitionBasicBlocks: partitions the instructions in a methods code so that each partition starts with a leader and contains no other leaders.
As an example, consider the program shown in Figure 3. The Java source code is compiled to class file, and we get the instruction set through the BCEL. Every piece of instruction has a number that denotes its position in the set. Using the algorithm described above, we get 6 basic blocks.

| # | Instructions | Basic blocks # |
|---|---|---|
| 0 | iconst_0 | 1 |
| 1 | istore_1 | 1 |
| 2 | goto 17 | 2 |
| 5 | getstatic 7 | 3 |
| 8 | aload_0 | 3 |
| 9 | iload_1 | 3 |
| 10 | aaload | 3 |
| 11 | invokevirtual 5 | 3 |
| 14 | iinc 1,1 | 4 |
| 17 | iload_1 | 5 |
| 18 | aload_0 | 5 |
| 19 | arraylength | 5 |
| 20 | if_icmplt 5 | 5 |
| 23 | return | 6 |
2.4 Flatten the control flow
Once get the basic blocks, we can flatten the control flow using the following algorithm:
Get all the entrances of basic blocks. We use two arrays to record the entrances. The first is an integer array which records the number of every entrance. The second is an array composed of Instructions which record the instruction entrance, then scan all of the basic blocks and record the value into the two arrays.
Register initialization. A load from a register must always follow at least one store in the register. During the process of partitioning basic blocks, the store instruction and the load instruction to the same variable may be put into two basic blocks, and the verifier will reject the modified file. Here, we scan all of the instructions in the method and record all the store instructions. Then insert corresponding store instructions to the beginning of the modified file and all the load instruction will be accepted as a result.
Create the switch case statement. From the step 1, we have known the entrances of basic blocks and at the same time, give a number to each entrance. The number is used by the switch statement to determine which block is to be executed next. Here we insert a switch case statement to the beginning of the method to control the execution flow.
Insert goto. We insert a goto statement to the end of every basic block, and the goto is used to which jump to the switch statement so that the execution flow will go to the next basic block.
Dump to the file. The result after step 4 is a interior form of BCEL and we should dump it to the class file to finish the obfuscation transformation.
2.5 Experimental results
Our experiments were run under Windows xp on AMD Sempron 1.8GHz, with 512MB RAM. We used the Sun Java 2 SDK Version 1.2.1 Windows Release Virtual Machine with the default Just–In–Time compiler turned on.
We tested our obfuscation tool on three type Java programs, In table 1.There are three programs: jump–intensive program, loop–intensive program and sequence–process program. The jump–intensive program includes many jump statements, while the loop–intensive program includes many for or while statements. There is little jump or loop statement in sequence–process program. Each test was done twelve times. The numbers in the table are the averages. Dimensions: code size is in bytes; execution time is in milliseconds.
The experiments show that the obfuscation makes the code size, the branch number and execution time increased. The more branch number is, the more cost. We should make the balance between performance and security.
We have tried to attack the obfuscated program with several java reverse engineering tools and not get the right control flow.
3. Conclusions
Although obfuscation results in less efficient code, it is relatively cheap to perform and has aroused increased interest in the past two years. This paper experimented with a obfuscation system, and the result shows that obfuscation can be done efficiently with moderate increases in code size, execution times, while making the obfuscated code resilient to a variety of reverse engineering attacks.
References
- D. Low C. Collberg, C. Thomborson. A taxionomy of obduscating transformations. Technical Report 148. Dept. of Computer Science, The Univ. of Auckland, 1997.
- D. Low. Java Control Flow Obfuscation (Master of Science Thesis). Department of Computer Science, The University of Auckland, 1998.
- WANG C. A Security Architecture for Survivability Mechanisms (Ph.D. Thesis). Department of Computer Science, University of Virginia, October 2000.
- S. Chow, Y. Gu, H. Johnson, and V. Zakharov. An approach to the obfuscation of control–flow of sequential computer programs. In G. Davida and Y. Frankel. (Eds.), Information Security, ISC 2001, volume 2200 of Lectures Notes in Computer Science (LNCS): Springer Verlag, 2001, 68.
- Available at: http://jakarta.apache.org/bcel/manual.html.
- Xavier Leroy. Java bytecode verification: Algorithms and formalizations. Journal of Automated Reasoning.
- Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman. Compilers, Principles, Techniques and Tools. Addison–Wesley, ISBN 0–201–10088–6, 1986.
- Douglas Low. Java Control Flow Obfuscation. June 3, 1998.
0 comments:
Post a Comment