Monday, May 5, 2008

A Survey of Anti-Tamper Technologies

This article surveys the various anti–tamper (AT) technologies used to protect software. The primary objective of AT techniques is to protect critical program information by preventing unauthorized modification and use of software. This protection goal applies to any program that requires protection from unauthorized disclosure or inadvertent transfer of leadingedge technologies and sensitive data or systems. In this article, we review the various approaches to AT techniques, their strengths and weaknesses, their advantages and disadvantages, and briefly discuss a process for developing program protection plans. We also survey the tools that are typically used to circumvent AT protections, and techniques that are commonly used to make these protections more resilient against such attack.

The unauthorized modification and subsequent misuse of software is often referred to as software cracking. Usually, cracking requires disabling one or more software features that enforce policies (of access, usage, dissemination, etc.) related to the software. Because there is value and/or notoriety to be gained by accessing valuable software capabilities, cracking continues to be common and is a growing problem.

To combat cracking, anti–tamper (AT) technologies have been developed to protect valuable software. Both hardware and software AT technologies aim to make software more resistant against attack and protect critical program elements. However, before discussing the various AT technologies, we need to know the adversary's goals. What do software crackers hope to achieve. Their purposes vary, and typically include one or more of the following:

  • Gaining unauthorized access.
    The attacker's goal is to disable the software access control mechanisms built into the software. After doing so, the attacker can make and distribute illegal copies whose copy protection or usage control mechanisms have been disabled — this is the familiar software piracy problem. If the cracked software provides access to classified data, then the attacker's real goal is not the software itself, but the data that is accessible through the software. The attacker sometimes aims at modifying or unlocking specific functionality in the program, e.g., a demo or export version of software is often a deliberately degraded version of what is otherwise fully functional software. The attacker then seeks to make it fully functional by re–enabling the missing features.
  • Reverse engineering.
    The attacker aims to understand enough about the software to steal key routines, to gain access to proprietary intellectual property, or to carry out code–lifting, which consists of reusing a crucial part of the code (without necessarily understanding the internals of how it works) in some other software. Good programming practices, while they facilitate software engineering, also tend to simultaneously make it easier to carry out reverse engineering attacks. These attacks are potentially very costly to the original software developer as they allow a competitor (or an enemy) to nullify the developer's competitive advantage by rapidly closing a technology gap through insights gleaned from examining the software.
  • Violating code integrity.
    This familiar attack consists of either injecting malicious code (malware) into a program, injecting code that is not malevolent but illegally enhances a program's functionality, or otherwise subverting a program so it performs new and unadvertised functions (functions that the owner or user would not approve of). While AT technology is related to anti–virus protection, it has some crucial differences. AT technology is similar to virus protection in that it impedes malware infection of an AT–protected executable. However, AT technology differs from virus protection in that the AT technology's goal is not only to protect the client's software from unauthorized modification by malevolent outsiders (infection by malware written by others), but also to protect the software from modification by an authorized client. In many situations, it is important that only authorized applications execute (e.g., in a taximeter, odometer, or any situation where tampering is feared), using only authorized functionality, and that only valid data is used.

It should be clear by now that AT technology is not only about anti–piracy, it has an equal and broader aim of policy enforcement. That aim is to enforce the policies of the software publisher about the proper use of the software, even as the software is running in a potentially hostile environment where the user owns the processor and is intent on violating those policies.

There is a plethora of AT protection mechanisms. These include encryption wrappers, code obfuscation, guarding, and watermarking/fingerprinting in addition to various hardware techniques. While these techniques are discussed separately for pedagogical purposes, the reader should bear in mind that software is best protected when several protection techniques are used together in a mutually supportive manner. No technique is invulnerable or even clearly superior to the others in all circumstances; therefore, a mix of protection techniques allows the defense to capitalize on the strengths of each technique while also masking the shortfalls of other techniques. In the following paragraphs we present a brief overview of these techniques.

Hardware–Based Protections

The most common hardware approach uses a trusted processor. The trusted, tamper–resistant hardware checks and verifies every piece of hardware and software that exists — or that requests to be run on a computer — starting at the boot–up process [1]. This hardware could guarantee integrity by checking every entity when the machine boots up, and every entity that will be run or used on that machine after it boots up. The hardware could, for example, store all of the keys necessary to verify digital signatures, decrypt licenses, decrypt software before running it, and encrypt messages during any online protocols it may need to run (e.g., for updates) with another trusted remote entity (such as the software publisher).

Software downloaded onto a machine would be stored in encrypted form on the hard drive and would be decrypted and executed by the hardware, which would also encrypt and decrypt information it sends and receives from its random access memory. The same software or media could be encrypted in a different way for each trusted processor that would execute it because each processor would have a distinctive decryption key. This would put quite a dent in the piracy problem, as disseminating your software or media files to others would not do them much good (because their own hardware would have different keys).

A less drastic protection than using a separate, trusted, hardware computational device also involves hardware, but is more lightweight such as a smart card or physically secure token. These lightweight hardware protection techniques usually require that the hardware be present for the software to run, to have certain functionality, to access a media file, etc. Defeating this kind of protection usually requires working around the need for the hardware rather than duplicating the hardware. The difficulty of this work–around depends on the role that the tamper–resistant hardware plays in the protection. A device that just outputs a serial number is trivially vulnerable to a replay attack (e.g., an attacker replays a valid serial number to the software, without the presence of the hardware device), whereas a smart card that engages in a challenge–response protocol (different data each time) prevents the simple replay attack but is still vulnerable (e.g., to modification of the software interacting with the smart card). A device that decrypts content or that provides some essential feature of a program or media file is even harder to defeat.

Advantages and Drawbacks

The chief advantage of hardware–based protection techniques is that they run on a trusted CPU and can be made arbitrarily complex — hence, difficult to defeat while inflicting minimal computational cost on the protected software once it has been decrypted within the hardware and is running. However, there is a cost to decrypt it in the first place, and also to encrypt everything that goes out to the non–protected part of the system, and then decrypt it when it comes back into the trusted hardware.

In addition, it is generally more difficult to successfully attack tamper–resistant hardware and make the exploit directly available to others than a software–only protection scheme. This point holds only for a properly designed system. A compromise of hardware that imprudently contains the same secret keys as all other hardware of the same type would lead to widely reproducible exploits.

The advantages of hardware protection also include its capability to enforce such rules as «only approved peripherals can be a part of this computer system,» or «only approved (through digital signatures) software and contents are allowed», etc.

Nevertheless, hardware–based protection also has its drawbacks. There is the usual problem of inflexibility: hardware–based protections are more awkward to modify, port, and update than software–based ones. They are also less secure than commonly assumed and can be broken; see, e.g., [2]. To date, it has not been demonstrated that hardware protections can scale to grid computing or to smallscale computing. In addition, there is no guarantee that all avenues of attack are closed by hardware protection, and there is a significant cost attached to using hardware protection; the cost is driven mainly by the time needed to assemble, integrate, and test the hardware protection technique.

Additional drawbacks to the hardware protection approach include its expense and general fragility to accidents (an electric power surge that fries the processor also renders the hard drive contents unusable because the key that decrypts them is destroyed). The potential implications for censorship are also chilling. Another disadvantage of hardware protection is the boot–up time and the time spent encrypting and decrypting, which makes the approach problematic for low–end machines and embedded systems (unless the whole system lies within tamper–resistant hardware).

Using trusted hardware also incurs many indirect costs as a result of the earlier–mentioned limitations it imposes (e.g., the restriction to only certain approved hardware, software, and media creates a barrier to competition that leads to higher prices). Due to the imperfect protection offered by hardware protection, a more robust approach to software security interweaves hardware protection with other protection techniques such as those discussed in the following sections.

The rest of this article discusses the various software–based protection mechanisms. The reader should keep in mind that hardware and software protection techniques are not mutually exclusive. A judicious combination can serve to increase the security of the system more than any of its individual component techniques.

Encryption Wrappers

With encryption wrapper software security, critical portions of the software (or possibly all of it) are encrypted and decrypted dynamically at run–time. The encryption wrapper approach works well against a static attack, and forces the attacker to run the program in order to get an unencrypted image of it. To make the attacker's task harder, at no time during execution is the whole software in the clear; code decrypts just before it executes, leaving other parts of the program still encrypted. Therefore, no single snapshot of memory can expose the whole decrypted program. Of course, the attacker can take many such snapshots, compare them, and piece together the unencrypted program.

Another avenue of attack is to figure out the various decryption keys that are present in the software. One defensive technique that can be used to delay the attacker is to include defensive mechanisms in the program that deprive the attacker of using run–time attack tools, e.g., anti–debugger, anti–memory dump, and other defensive mechanisms, which make it more difficult for the attacker to run and analyze the program in a synthetic (virtual machine) environment. Yet, a determined attacker can usually defeat these protections (e.g., through the use of virtual machines that faithfully emulate a PC, including the most rarely used instructions, cache behavior, etc).

Encryption wrappers often use lightweight encryption to minimize the computational cost of executing the protected program. The encryption can be advantageously combined with compression: Not only does this result in a smaller amount of storage usage, but it also makes the encryption harder to defeat by cryptanalysis (of course one compresses before encryption, not the other way around).

An encryption wrapper's chief advantage is that it effectively hinders an attacker's ability to statically analyze a program. The attacker is then forced to perform more sophisticated types of dynamic attacks, which can significantly increase the amount of time needed to defeat the protection. The main disadvantage of encryption wrappers is the performance penalty caused by the decryption overhead, and its weakness to memory dumps: before it can run, encryption–protected software must be decrypted, at which point it becomes exposed.

Code Obfuscation

Code obfuscation consists of transforming code so it becomes less intelligible to a human, thus making it not only harder to reverse engineer, but also harder to tamper with. In software that has specific areas where policy checks are made, these areas will be harder to identify and disable after the software has been obfuscated. Obfuscation is usually carried out by inserting or performing obfuscating transformations. It is a requirement that these transformations do not damage a program's functionality, and it must have only a moderate impact on code performance, and on the storage space used on the disk and at run–time (of the two, speed is more important).

The obfuscation must also be resilient to attack, and for this reason it is desirable to maximize the obscurity of the obfuscated software. The obfuscating transformations need to be resilient against tools designed to automatically undo them, and to not be easily detectable by statistical analysis of the resulting code (resilience to statistical analysis makes it harder for automatic tools to find the locations where these transformations were applied).

The different types of obfuscation transformations that have been proposed [3] include the following:

  • Layout obfuscation.
    This modifies the physical appearance of the code, e.g., replacing important variables with random strings, removing all formatting (making nested conditional statements harder to read), etc. Such transformations are easy to make but are effective only when combined with other transformation techniques.
  • Data obfuscation.
    This obscures the data structures used within a program, e.g., the representation and the methods of using that data, independent data merging (and vice–versa — splitting up data that is dependent), etc. Data obfuscation serves to delay the attacker because data structures contain important information that any attacker needs to comprehend before launching an attack.
  • Control obfuscation.
    This manipulates the control flow of a program to make it difficult to discern its original structure, e.g., through merging (or splitting) various fragments of code, reordering expressions, loops, or blocks, etc. It is similar to creating a spurious program that is entangled with the original program so as to obscure the important control features of that program.
  • Preventive transformations.
    These aim at making it difficult for a deobfuscation tool to extract the true program from the obfuscated version of it. Preventive transformations can be implemented by using what Collberg [4] calls opaque predicates, an example of which is a conditional statement that always evaluates as true, but in a manner that is hard to recognize.

Obfuscation can be done at the source–code level (source–to–source translation) or at the assembly level. Although most obfuscators are of the former kind (source–to–source), assembly level obfuscation is better because it effectively hides the operation of the binary. If the source–code level transformations hide information by adding crude and inefficient ways of doing simple tasks, then the code optimizer in the compiler may undo them. If, on the other hand, the transformations are clever enough to fool the optimizer, then it can fail to properly do its job, and the performance of the resulting code suffers. Low– level obfuscation does not prevent the code optimizer from doing its job, but if done carelessly it runs the risk of producing code that looks so different from the kind produced by the compiler that it inadvertently flags the areas where the transformations were applied.

Obfuscation transformations are classified according to several criteria: how much obscurity they add to the program (potency), how difficult they are to break for a de–obfuscator (resilience), and how much computational overhead they add to the obfuscated application (cost). In [4], software complexity metrics are used to formalize the notion of transformation potency and resilience.

The potency of a transformation measures how much more difficult the obfuscated code is to understand for a human than the original code. On the other hand, the resilience of a transformation measures how well it stands up to attack by an automatic de–obfuscator. The resilience measurement takes two factors into account: the programmer effort required to construct the de–obfuscator and the execution time and space required by the de–obfuscator to reduce the potency of the transformation. The best obfuscation is usually achieved by a combination of the above three mentioned transformations. The combination of the three approaches provides a well–balanced mix of highly potent and resilient transformations.

Like all software–only protections, obfuscation can delay — but not prevent — a determined attacker intent on reverse engineering the software. Barak [5] presents a family of functions that are provably impossible to completely and successfully obfuscate. For more information and a discussion of code obfuscation, refer to [3,4,6,7].

Software Watermarking and Fingerprinting

The goal of watermarking is to embed information into software in a manner that makes it hard to remove by an adversary without damaging the software's functionality. The information inserted could be purchaser information, or it could be an integrity check to detect modification, the placing of caption–type information, etc. A watermark need not be stealthy; visible watermarks act as a deterrent (against piracy, for example), but most of the literature has focused on stealthy watermarks. In steganography (the art of concealing the existence of information within seemingly innocuous carriers), the mark is required to be stealthy: its very existence must not be detectable [8].

A specific type of watermarking is fingerprinting, which embeds a unique message in each instance of the software for traitor tracing. This has consequences for the adversary's ability to attack the watermark: two differently marked copies often make possible a diff attack that compares the two differently marked copies and can enable the adversary to create a usable copy that has neither one of the two marks. Thus, in any fingerprinting scheme, it is critical to use techniques that are resilient against such comparison attacks.

A watermark is generally required to be robust (hard to remove). In some situations, however, a fragile watermark is desirable; it is destroyed if even a small alteration is made to the software (e.g., this is useful for making the software tamper–evident).

Software watermarks can be static, i.e., readable without running the software, or could appear only at run–time (preferably in an evanescent form). In either case, reading the watermark usually requires knowing a secret key, without which the watermark remains invisible.

Watermarks may be used for proof of software authorship or ownership, fingerprinting for identifying the source of illegal information/software dissemination, proof of authenticity, tamper–resistant copyright protection, and captioning to provide information about the software. When software watermarks are used for proof of authorship or ownership (culprit–tracing), it is important to use a very resilient scheme. Recall that this is when the watermark contains information about the copyright owner as well as the entity that is licensed to use the software, thus allowing trace–back to the culprit if the item were to be illegally disseminated to others. Breaking the security of such a scheme can enable the attacker to frame an innocent victim.

As you can see, while watermarks can demonstrate authorized possession and the fact that software has been pirated, they do not address the reverse engineering or authorized execution issues of the other schemes discussed; therefore, we advocate the development and use of a spectrum of software protection techniques.

Guarding

A guard is code that is injected into the software for the sake of AT protection. A guard must not interfere with the program's basic functionality unless that program is tampered with — it is the tampering that triggers a guard to take action that deviates from normal program behavior. Examples of guard functionality range from tasks as simple as comparing a checksum of a code fragment to its expected value, to repairing code (in case it was maliciously damaged), to complex and indirect forms of protection through subtle side effects.

The preferred use of the guarding approach consists of injecting into the code to be protected a large number of guards that mutually protect each other as well as the software program in which they now reside. Guards can also be used to good effect in conjunction with hardware–based protection techniques to further ensure that the protected software is only executed in an authorized environment.

The number, types, and stealthiness of guards; the protection topology (who protects who); and where the guards are injected in the original code and how they are entangled with it are some of the parameters in the strength of the resulting protection: They are all tunable in a manner that depends on the type of code being protected, the desired level of protection, etc.

Manually installing such a tangled web of protection is impractical (as it must be done every time the software is updated), so it is important that this protection be done in a highly automated fashion using high–level scripts that specify the protection guidelines and parameters. It should be thought of as a part of the compilation process where an anti–tamper option results in code that is guarded and tamper–resistant.

The rationale for this approach is that a single (even if elaborate) AT protection scheme for a software application is insufficient because a single defense results in a single point of attack that can be located and compromised. To make self–protection robust, the defense must not rely on a single complex protection technique no matter how effective it might be. Instead, there needs to be a multitude of (possibly simple) protection techniques installed in the program that cooperatively enforce the code's integrity as well as protect the other against tampering.

The guard's response when it detects tampering is flexible and can range from a mild response to the disruption of normal program execution through injection of run–time errors (crashes or even subtle errors in the answers computed); the reaction chosen depends on the software publisher's business model and the expected adversary. Generally, it is better for a guard's reaction to be delayed rather than to occur immediately upon detection so that tracing the reaction back to its true cause is as difficult as possible and consumes a great deal of the attacker's time. More on guarding can be found in [9].

AT Process

This section explores the various AT guidelines expressed in the «Defense Acquisition Guidebook» [10], and the recommended process for developing a program protection plan. The «Defense Acquisition Guidebook» specifies the AT measures that should be considered for use on any system with critical program information (CPI), developed with allied partners, likely to be sold or provided to United States allies and friendly foreign governments, or likely to fall into enemy hands. The first step in the recommended AT methodology is to develop a program protection plan. The process of developing this plan includes the following:

  • Develop a list of critical technologies.
  • Develop a threat analysis.
  • Develop a list of identified vulnerabilities.
  • Develop a preliminary AT requirement.
  • Perform an analysis of AT methods that applies to the system, including cost/benefit assessments.
  • Provide an explanation of which AT method(s) will be implemented; develop a plan for validating the AT implementation.

The standard approach of validating AT protections is done via red–teaming. A red team consists of individuals who are well versed in security methods and their corresponding weaknesses. Their primary objectives are to attempt to defeat the protection, to assess the protection's strengths and weaknesses, and to make recommendations for improvement. While this is an effective method of evaluation, a major problem with red teams is that the validation is done by humans, and may not be totally reliable or repeatable. Furthermore, as the need for AT technologies grows, red teams are becoming increasingly called upon to perform evaluations. The teams are overwhelmed with assignments, significant delays in product evaluations, and release results. To improve this process, there is a clear and present need for automated testing and validation tools. Such tools could be used to define a standard set of metrics and guidelines to evaluate software protections.

Conclusion

This article has surveyed the motivation for using AT technology, the hardware and software AT techniques in use today, and the strengths and weaknesses of AT technologies. We also briefly introduced the process and documentation used to develop a program protection plan. The motivation for and primary objective of AT technology is to protect CPI by preventing unauthorized modification and use of software. The main software AT techniques include encryption wrappers, code obfuscation, watermarking/fingerprinting, and guarding.

A fundamental challenge faced by software AT technology is that the protected application is running on a host that is not trusted, and thus cannot be assured to be secure. Guards address this shortfall to a degree and in a flexible and extensible manner. However, in light of the need for robust, seamless, comprehensive software defense, using both software and hardware AT solutions in combination often offers an appealing alternative to using them individually (especially if economic considerations are factored in).

At this time, indications are that if strong software AT technology (e.g., in the form of judiciously constructed guards) is added to an application so that it requires the presence of a lightweight tamper–resistant hardware device in order to execute properly, the result is a strong yet economical software protection capability.

References

  1. Arbaugh, W., D. Farber, and J. Smith. A Secure and Reliable Bootstrap Architecture. Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, 1997.
  2. Anderson, R., and M. Kuhn. Tamper Resistance — A Cautionary Note. Proc. of Second Usenix Workshop on Electronic Commerce, Oakland, CA, Nov. 1996: 1–11.
  3. Collberg C., and C. Thomborson. «Watermarking, Tamper–Proofing, and Obfuscation Tools for Software Protection». IEEE Transactions on Software Engineering 28.8 (2002): 735–746, 2002.
  4. Collberg, C., C. Thomborson, and D. Low. «A Taxonomy of Obfuscating Transformations». Department of Computer Science, University of Auckland, New Zealand, 1997.
  5. Barak, B., et al. «On the (Im)possibility of Obfuscating Programs». Electronic Colloquium on Computational Complexity. Report No. 57, 2001.
  6. Wang, C., et al. «Software Tamper Resistance: Obstructing Static Analysis of Programs». University of Virginia, Computer Science Technical Report CS–2000–12, Dec. 2000.
  7. Wroblewski, G. «General Method of Program Code Obfuscation». Diss. Wroclaw University of Technology, Institute of Engineering Cybernetics, 2002.
  8. Johnson, N. «Introduction to Steganography and Steganalysis». Workshop on Statistical and Machine Learning Techniques in Computer Intrusion Detection, Johns Hopkins University, 11–13 June 2002.
  9. Chang, H., and M. Atallah. Protecting Software Code By Guards. Proc. of ACM Workshop on Security and Privacy in Digital Rights Management, Philadelphia, PA, Nov. 2001: 160–175.
  10. Office of the Secretary of Defense. Interim Defense Acquisition Guidebook. Washington, D.C.: OSD, 30 Oct. 2002 http://dod5000.dau.mil/DoD5000Interactive/InterimGuidebook.asp

Additional Reading

  1. Lipton, R.J., S. Rajagopalan, and D.N. Serpanos. «Spy: A Method to Secure Clients for Network Services». IEEE Distributed Computing Systems Workshops 2002: 23–28.
  2. Anderson, R., and M. Kuhn. «Low Cost Attacks on Tamper Resistant Devices». 5th International Workshop on Security Protocols, Apr. 1997: 125–136.
Dr. Mikhail J. Atallah, Eric D. Bryant, and Dr. Martin R. Stytz Arxan Technologies, Inc.

Monday, April 28, 2008

Static Disassembly of Obfuscated Binaries

Disassembly is the process of recovering a symbolic representation of a program's machine code instructions from its binary representation. Recently, a number of techniques have been proposed that attempt to foil the disassembly process. These techniques are very effective against state–of–the–art disassemblers, preventing a substantial fraction of a binary program from being disassembled correctly. This could allow an attacker to hide malicious code from static analysis tools that depend on correct disassembler output (such as virus scanners).

The paper presents novel binary analysis techniques that substantially improve the success of the disassembly process when confronted with obfuscated binaries. Based on control flow graph information and statistical methods, a large fraction of the program's instructions can be correctly identified. An evaluation of the accuracy and the performance of our tool is provided, along with a comparison to several state–of–the–art disassemblers.

1. Introduction

Software applications are often distributed in binary form to prevent access to proprietary algorithms or to make tampering with licensing verification procedures more difficult. The general assumption is that understanding the structure of a program by looking at its binary representation is a hard problem that requires substantial resources and expertise.

Software reverse–engineering techniques provide automated support for the analysis of binary programs. The goal of these techniques is to produce a higher–level representation of a program that allows for comprehension and possibly modification of the program's structure.

The software reverse–engineering process can be divided into two parts: disassembly and decompilation. The task of the disassembly phase is the extraction of the symbolic representation of the instructions (assembly code) from the program's binary image [12]. Decompilation [5,6] is the process of reconstructing higher–level semantic structures (and even source code) from the program's assembly–level representation.

A number of approaches have been proposed to make the reverse–engineering process harder [8,9,17]. These techniques are based on transformations that preserve the program's semantics and functionality and, at the same time, make it more difficult for a reverse–engineer to extract and comprehend the program's higher–level structures. The process of applying one or more of these techniques to an existing program is called obfuscation.

Most previouswork on program obfuscation has focused on the decompilation phase. To this end, researchers have proposed to use constructs such as indirect jumps or indirect memory references via pointers that are difficult to analyze [14]. In [13], Linn and Debray introduce novel obfuscation techniques that focus on the disassembly phase instead. Their techniques are independent of and complementary to previous approaches to make decompilation harder. The main idea is to transform the binary such that the parsing of instructions becomes difficult. The approach exploits the fact that the Intel x86 instruction set architecture contains variable length instructions that can start at arbitrary memory address. By inserting padding bytes at locations that cannot be reached during run–time, disassemblers can be confused to misinterpret large parts of the binary. Although their approach is limited to Intel x86 binaries, the obfuscation results against current state–of–the–art disassemblers are remarkable.

Linn and Debray state that their obfuscation techniques can enhance software security by making it harder for an attacker to steal intellectual property, to make unauthorized modifications to proprietary software or to discover vulnerabilities. On the other hand, program obfuscation could also be used by attackers to hide malicious code such as viruses or Trojan Horses from virus scanners [3,16]. Obfuscation also presents a serious threat to tools that statically analyze binaries to isolate or to identify malicious behavior [2,11]. The reason is that if relevant program structures were incorrectly extracted, malicious code could be classified as benign.

This paper presents static analysis techniques to correctly disassemble Intel x86 binaries that are obfuscated to resist static disassembly. The main contribution are general control–flow–based and statistical techniques to deal with hard–to–disassemble binaries. Also, a mechanism is presented that is specifically tailored against the tool implemented by Linn and Debray [13]. An implementation based on our approach has been developed, and the results show that our tool is able to substantially improve the disassembly of obfuscated binaries.

The paper is structured as follows. In Section 2, the principal techniques used in binary disassembly are reviewed, together with a discussion of Linn and Debray's recently proposed obfuscation techniques. In Section 3, we outline the disassembly approach and present our assumptions. Section 4 and Section 5 provide an indepth description of our disassembly techniques. In Section 6, a quantitative evaluation of the accuracy and performance of our disassembler is presented. Finally, in Section 7, we briefly conclude and outline future work.

2. Related Work and Background

Disassembly techniques can be categorized into two main classes: dynamic techniques and static techniques.

Approaches that belong to the first category rely on monitored execution traces of an application to identify the executed instructions and recover a (partial) disassembled version of the binary. Approaches that belong to the second category analyze the binary structure statically, parsing the instruction opcodes as they are found in the binary image.

Both static and dynamic approaches have advantages and disadvantages. Static analysis takes into account the complete program, while dynamic analysis can only operate on the instructions that were executed in a particular set of runs. Therefore, it is impossible to guarantee that the whole executable was covered when using dynamic analysis. On the other hand, dynamic analysis assures that only actual program instructions are part of the disassembly output. In this paper, we focus on static analysis techniques only.

In general, static analysis techniques follow one of two approaches. The first approach, called linear sweep, starts at the first byte of the binary's text segment and proceeds from there, decoding one instruction after another. It is used, for example, by GNU's objdump [10]. The drawback of linear sweep disassemblers is that they are prone to errors that result from data embedded in the instruction stream. The second approach, called recursive traversal, fixes this problem by following the control flow of the program [6,15]. This allows recursive disassemblers to circumvent data that is intertwined with the program instructions. The problem with the second approach is that the control flow cannot always be reconstructed precisely. When the target of a control transfer instruction such as a jump or a call cannot be determined statically (e.g., in case of an indirect jump), the recursive disassembler fails to analyze parts of the program's code. This problem is usually solved with a technique called speculative disassembly [4], which uses a linear sweep algorithm to analyze unreachable code regions.

Linn and Debray's approach [13] to confuse disassemblers are based on two main techniques. First, junk bytes are inserted at locations that are not reachable at run–time. These locations can be found after control transfer instructions such as jumps where control flow does not continue. Consider the example in Figure 1, where a function is presented in source form and in its corresponding assembly representation. At address 0x8048012, two junk bytes are added after the jump instruction at address 0x8048010. Inserting junk bytes at unreachable locations should not effect recursive disassemblers, but has a profound impact on linear sweep implementations.

Example disassembly obfuscated function Figure 1: Example function

The second technique relies on a branch function to change the way regular procedure calls work. This creates more opportunities to insert junk bytes and misleads both types of disassemblers. A normal call to a subroutine is replaced with a call to the branch function. This branch function uses an indirect jump to transfer control to the original subroutine. In addition, an offset value is added to the return address of the subroutine. When the subroutine is done, control is not transfered to the address directly after the call instruction. Instead, the instruction that is offset number of bytes after the call instruction is executed. In the example in Figure 1, two junk bytes are inserted after the call to the branch function at address 0x8048003. During run–time, the branch function modifies the return address such that the next instruction that is executed after the call is at address 0x804800a.

Figure 2 shows the disassembly results for the example function when using a linear sweep and a recursive traversal disassembler. The linear sweep disassembler is successfully confused in both cases where junk bytes are inserted. The two junk bytes at 0x8048008 are interpreted as or instruction, causing the the following four bytes (which are actually a cmp and a jne instruction) as being parsed as a 32–bit argument value. A similar problem occurs at address 0x8048012, resulting in only 5 out of 12 correctly identified instructions.

Traditional disassemblers Figure 2: Traditional disassemblers

This recursive disassembler is not vulnerable to the junk bytes inserted at address 0x8048012 because it recognizes instruction 0x8048010 as an unconditional jump. Therefore, the analysis can continue at the jump target, which is at address 0x8048019. However, the junk bytes after the call instruction at 0x8048003 lead to incorrect disassembly and the subsequent failure to decode the jump at 0x804800c with its corresponding target at 0x8048014. In this example, the recursive traversal disassembler succeeds to correctly identify 9 out of 12 instructions. However, the situation becomes worse when dealing with real binaries. Because calls are redirected to the branch function, large parts of the binary become unreachable for the recursive traversal algorithm. The results in Section 6 demonstrate that recursive traversal disassemblers, such as IDA Pro, perform worse on obfuscated binaries than linear sweep disassemblers, such as objdump.

3. Disassembling Obfuscated Binaries

Our disassembler performs static analysis on Intel x86 binaries. When analyzing an obfuscated binary, one cannot assume that the code was generated by a wellbehaved compiler. In fact, the obfuscation techniques introduced by Linn and Debray [13] precisely exploit the fact that standard disassemblers assume certain properties of compiler–generated code that can be violated without changing the program's functionality. By transforming the binary into functionally equivalent code that does not possess all the assumed properties, standard disassemblers are confused and fail to correctly translate binary code into its corresponding assembly representation. In general, certain properties are easier to change than others and it is not straightforward to transform (i.e., obfuscate) a binary into a functionally equivalent representation in which all the compiler–related properties of the original code are lost. When disassembling obfuscated binaries, we require that certain assumptions are valid. These assumptions (some of which constitute limiting factors for our ability to disassemble obfuscated binaries) are described in the following subsections.

  1. Valid instructions must not overlap. An instruction is denoted as valid if it belongs to the program, that is, it is reached (and executed) at run–time as part of some legal program execution trace. Two instructions overlap if one or more bytes in the executable are shared by both instruction. In other words, the start of one instruction is located at an address that is already used by another instruction. Overlapping instructions have been suggested to complicate disassembly in [7]. However, suitable candidate instructions for this type of transformation are difficult to find in real executables and the reported obfuscation effects were minimal [13].
  2. Conditional jumps can be either taken or not taken. This means that control flow can continue at the branch target or at the instruction after the conditional branch. In particular, it is not possible to insert junk bytes at the branch target or at the address following the branch instruction. Linn and Debray [13] discuss the possibility to transform unconditional jumps into conditional branches using opaque predicates. Opaque predicates are predicates that always evaluate to either true or false, independent of the input. This would allow the obfuscator to insert junk bytes either at the jump target or in place of the fall–through instruction. However, it is not obvious how to generate opaque predicates that are not easily recognizable for the disassembler. Also, the obfuscator presented in [13] does not implement this transformation.
  3. An arbitrary amount of junk bytes can be inserted at unreachable locations. Unreachable locations denotes locations that are not reachable at run–time. These locations can be found after instructions that change the normal control flow. For example, most compilers arrange code such that the address following an unconditional jump contains a valid instruction. However, we assume that an arbitrary number of junk bytes can be inserted there.
  4. The control flow does not have to continue immediately after a call instruction. Thus, an arbitrary number of padding bytes can be added after each call. This is different from the standard behavior where it is expected that the callee returns to the instruction following a call using the corresponding return instruction. More specifically, in the x86 instruction set architecture, the call operation performs a jump to the call target and, in addition, pushes the address following the call instruction on the stack. This address is then used by the corresponding ret instruction, which performs a jump to the address currently on top of the stack. However, by redirecting calls to a branch function, it is trivial to change the return address.

Our disassembly techniques can be divided into two classes: general techniques and tool–specific techniques.

General techniques are techniques that do not rely upon any knowledge on how a particular obfuscator transforms the binary. It is only required that the transformations respect our assumptions. Our general techniques are based on the program's control flow, similar to a recursive traversal disassembler. However, we use a different approach to construct the control flow graph, which is more resilient to obfuscation attempts. Program regions that are not covered by the control flow graph are analyzed using statistical techniques. The general techniques are described in more detail in Section 4.

An instance of an obfuscator that respects our assumptions is presented by Linn and Debray in [13]. By tailoring the static analysis process against a particular tool, it is often possible to reverse some of the performed transformations and improve the analysis results. Section 5 discusses potential modifications to our general techniques to take advantage of tool–specific knowledge when disassembling binaries transformed with Linn and Debray's obfuscator.

In Section 6, we show that the general techniques presented in the next section offer a significant improvement over previous approaches. When combined with tool–specific knowledge, the obfuscated binary is almost completely disassembled.

4. General Techniques

This section discusses the general techniques to reconstruct the program's control flow. Regions in the binary that are not covered by the control flow graph are analyzed using statistical methods.

4.1 Function Identification

The first step when disassembling obfuscated programs is to divide the binary into functions that can then be analyzed independently. The main reason for doing so is run–time performance; it is necessary that the disassembler scales well enough such that the analysis of large real–world binaries is possible.

An important part of our analysis is the reconstruction of the program's control flow. When operating on the complete binary, the analysis does not scale well for large programs. Therefore, the binary is broken into smaller regions (i.e., functions) that can be analyzed consecutively. This results in a run–time overhead of the disassembly process that is linear in the number of instructions (roughly, the size of the code segment).

A straightforward approach to obtain a function's start addresses is to extract the targets of call instructions. When a linker generates an ordinary executable, the targets of calls to functions located in the binary's text segment are bound to the actual addresses of these functions. Given the call targets and assuming that most functions are actually referenced from others within the binary, one can obtain a fairly complete set of function start addresses. Unfortunately, this approach has two drawbacks. One problem is that this method requires that the call instructions are already identified. As the objective of our disassembler is precisely to provide that kind of information, the call instructions are not available at this point. Another problem is that an obfuscator can redirect all calls to a single branching function that transfers control to the appropriate targets. This technique changes all call targets to a single address, thus removing information necessary to identify functions.

We use a heuristic to locate function start addresses. This is done by searching the binary for byte sequences that implement typical function prologs. When a function is called, the first few instructions usually set up a new stack frame. This frame is required to make room for local variables and to be able restore the stack to its initial state when the function returns. In the current implementation, we scan the binary for byte sequences that represent instructions that push the frame pointer onto the stack and instructions that increase the size of the stack by decreasing the value of the stack pointer. The technique works very well for regular binaries and also for the obfuscated binaries used in our experiments. The reason is that the used obfuscation tool [13] does not attempt to hide function prologs. It is certainly possible to extend the obfuscator to conceal the function prolog. In this case, our function identification technique might require changes, possible using tool–specific knowledge.

Note that the partitioning of the binary into functions is mainly done for performance reasons, and it is not crucial for the quality of the results that all functions are correctly identified. When the start point of a function is missed, later analysis simply has to deal with one larger region of code instead of two separate smaller parts. When a sequence of instructions within a function is misinterpreted as a function prolog, two parts of a single function are analyzed individually. This could lead to less accurate results when some intra–procedural jumps are interpreted as inter–procedural, making it harder to reconstruct the intra–procedural control flow graph as discussed in the following section.

4.2 Intra–Procedural Control Flow Graph

To find the valid instructions of a function (i.e., the instructions that belong to the program), we attempt to reconstruct the function's intra–procedural control flow graph. A control flow graph (CFG) is defined as a directed graph G = (V,E) in which vertices u, v ∈ V represent basic blocks and an edge e ∈ E : u → v represents a possible flow of control from u to v. A basic block describes a sequence of instructions without any jumps or jump targets in the middle. More formally, a basic block is defined as a sequence of instructions where the instruction in each position dominates, or always executes before, all those in later positions, and no other instruction executes between two instructions in the sequence. Directed edges between blocks represent jumps in the control flow, which are caused by control transfer instructions (CTIs) such as calls, conditional and unconditional jumps, or return instructions.

The traditional approach to reconstruct the control flow graph of a function works similar to a recursive disassembler. The analysis commences at the function's start address and instructions are disassembled until a control transfer instruction is encountered. The process is then continued recursively at all jump targets that are local to the procedure and, in case of a call instruction or a conditional jump, at the address following the instruction. In case of an obfuscated binary, however, the disassembler cannot continue directly after a call instruction. In addition, many local jumps are converted into non–local jumps to addresses outside the function to blur local control flow. In most cases, the traditional approach leads to a control flow graph that covers only a small fraction of the valid instructions of the function under analysis. This claim is supported by the experimental data shown in Section 6 that includes the results for a state–of–the–art recursive disassembler.

We developed an alternative technique to extract a more complete control flow graph. The technique is composed of two phases: in the first phase, an initial control flow graph is determined. In the following phase, conflicts and ambiguities in the initial CFG are resolved. The two phases are presented in detail in the next two sections.

4.2.1 Initial Control Flow Graph

To determine the initial control flow graph for a function, we first decode all possible instructions between the function's start and end addresses. This is done by treating each address in this address range as the begin of a new instruction. Thus, one potential instruction is decoded and assigned to each address of the function. The reason for considering every address as a possible instruction start stems from the fact that x86 instructions have a variable length from one to fifteen bytes and do not have to be aligned in memory (i.e., an instruction can start at an arbitrary address). Note that most instructions take up multiple bytes and such instructions overlap with other instructions that start at subsequent bytes. Therefore, only a subset of the instructions decoded in this first step can be valid. Figure 3 provides a partial listing of all instructions in the address range of the sample function that is shown in Figure 1. For the reader's reference, valid instructions are marked by an x in the «Valid» column. Of course, this information is not available to our disassembler. An example for the overlap between valid and invalid instructions can be seen between the second and the third instruction. The valid instruction at address 0x8048001 requires two bytes and thus interferes with the next (invalid) instruction at 0x8048002.

Partial instruction listing Figure 3: Partial instruction listing

The next step is to identify all intra–procedural control transfer instructions. For our purposes, an intraprocedural control transfer instruction is defined as a CTI with at least one known successor basic block in the same function. Remember that we assume that control flow only continues after conditional branches but not necessarily after call or unconditional branch instructions. Therefore, an instruction is an intra–procedural control transfer instruction if either (i) its target address can be determined and this address is in the range between the function's start and end addresses or (ii) it is a conditional jump.

Note that we assume that a function is represented by a contiguous sequence of instructions, with possible junk instructions added in between. However, it is not possible that the basic blocks of two different functions are intertwined. Therefore, each function has one start address and one end address (i.e., the last instruction of the last basic block that belongs to this function). However, it is possible that a function has multiple exit points.

In case of a conditional jump, the address that immediately follows the jump instruction is the start of a successor block, and thus, every conditional jump is also an intra–procedural control transfer operation. This is intuitively plausible, as conditional branches are often used to implement local branch (e.g., if–else) and loop (e.g., while, for) statements of higher–level languages, such as C.

To find all intra–procedural CTIs, the instructions decoded in the previous step are scanned for any control transfer instructions. For each CTI found in this way, we attempt to extract its target address. In the current implementation, only direct address modes are supported and no data flow analysis is performed to compute address values used by indirect jumps. However, such analysis could be later added to further improve the performance of our static analyzer. When the instruction is determined to be an intra–procedural control transfer operation, it is included in the set of jump candidates. The jump candidates of the sample function are marked in Figure 3 by an x in the «Candidate» column. In this example, the call at address 0x8048003 is not included into the set of jump candidates because the target address is located outside the function.

Given the set of jump candidates, an initial control flow graph is constructed. This is done with the help of a recursive disassembler. Starting with an initial empty CFG, the disassembler is successively invoked for all the elements in the set of jump candidates. In addition, it is also invoked for the instruction at the start address of the function.

The key idea for taking into account all possible control transfer instructions is the fact that the valid CTIs determine the skeleton of the analyzed function. By using all control flow instructions to create the initial CFG, we make sure that the real CFG is a subgraph of this initial graph. Because the set of jump candidates can contain both valid and invalid instructions, it is possible (and also frequent) that the initial CFG contains a superset of the nodes of the real CFG. These nodes are introduced as a result of argument bytes of valid instructions being misinterpreted as control transfer instructions. The Intel x86 instruction set contains 26 singlebyte opcodes that map to control transfer instructions (out of 219 single–byte instruction opcodes). Therefore, the probability that a random argument byte is decoded as CTI is not negligible. In our experiments (for details, see Section 6), we found that about one tenth of all decoded instructions are CTIs. Of those instructions, only two thirds were part of the real control flow graph. As a result, the initial CFG contains nodes and edges that represent invalid instructions. Most of the time, these nodes contain instructions that overlap with valid instructions of nodes that belong to the real CFG. The following section discusses mechanisms to remove these spurious nodes from the initial control flow graph. It is possible to distinguish spurious from valid nodes because invalid CTIs represent random jumps within the function while valid CTIs constitute a well–structured CFG with nodes that have no overlapping instructions.

Creating an initial CFG that includes nodes that are not part of the real control flow graph can been seen as the opposite to the operation of a recursive disassembler. A standard recursive disassembler starts from a known valid block and builds up the CFG by adding nodes as it follows the targets of control transfer instructions that are encountered. This technique seems favorable at first glance, as it makes sure that no invalid instructions are incorporated into the CFG. However, most control flow graphs are partitioned into several unconnected subgraphs. This happens because there are control flow instructions such as indirect branches whose targets often cannot be determined statically. This leads to missing edges in the CFG and to the problem that only a fraction of the real control flow graph is reachable from a certain node. The situation is exacerbated when dealing with obfuscated binaries, as inter–procedural calls and jumps are redirected to a branching function that uses indirect jumps. This significantly reduces the parts of the control flow graph that are directly accessible to a recursive disassembler, leading to unsatisfactory results.

Although the standard recursive disassembler produces suboptimal results, we use a similar algorithm to extract the basic blocks to create the initial CFG. As mentioned before, however, the recursive disassembler is not only invoked for the start address of the function alone, but also for all jump candidates that have been identified. An initial control flow graph is then constructed according to the code listing shown in Algorithm 1.

There are two differences between a standard recursive disassembler and our implementation. First, we assume that the address after a call or an unconditional jump instruction does not have to contain a valid instruction. Therefore, our recursive disassembler cannot continue at the address following a call or an unconditional jump. Note, however, that we do continue to disassemble after a conditional jump (i.e., branch). This can be seen at Label 5 of Algorithm 1 where the disassembler recursively continues after conditional branch instructions.

The second difference is due to the fact that it is possible to have instructions in the initial call graph that overlap. In this case, two different basic blocks in the call graph can contain overlapping instructions starting at slightly different addresses. When following a sequence of instructions, the disassembler can arrive at an instruction that is already part of a previously found basic block. In the regular case, this instruction is the first instruction of the existing block. The disassembler can complete the instruction sequence of the current block and create a link to the existing basic block in the control flow graph.

Algorithm disassemble

When instructions can overlap, it is possible that the current instruction sequence starts to overlap with another sequence in an existing basic block for some instructions before the two sequences eventually merge. At the point where the two sequences merge, the disassembler finds an instruction that is in the middle (or at the end) of a sequence associated with an existing basic block. In this case, the existing basic block is split into two new blocks. One block refers to the overlapping sequence up to the instruction where the two sequences merge, the other refers to the instruction sequence that both have in common. All edges in the control flow graph that point to the original basic block are changed to point to the first block, while all outgoing edges of the original block are assigned to the second. In addition, the first block is connected to the second one. The reason for splitting the existing block is the fact that a basic block is defined as a continuous sequence of instructions without a jump or jump target in the middle. When two different overlapping sequences merge at a certain instruction, this instruction has two predecessor instructions (one in each of the two overlapping sequences). Therefore, it becomes the first instruction of a new basic block. As an additional desirable side effect, each instruction appears at most once in a basic block of the call graph.

The functionality of splitting an existing basic block is implemented by the split procedure referenced at Label 3 of Algorithm 1. Whenever an instruction is found that is already associated with a basic block (check performed at Label 1), the instruction sequence of the current basic block is completed. When the instruction is in the middle of the existing block (check performed at Label 2), it is necessary to split the block. The current block is then connected either to the existing basic block or, after a split, to the newly created block that contains the common instruction sequence. The check performed at Label 4 takes care of the special case where the recursive disassembler starts with an instruction that is part of an existing basic block. In this case, the current block contains no instructions and a reference to the old block is returned instead.

The situation of two merging instruction sequences is a common phenomenon when disassembling x86 binaries. The reason is called self–repairing disassembly and relates to the fact that two instruction sequences that start at slightly different addresses (that is, shifted by a few bytes) synchronize quickly, often after a few instructions. Therefore, when the disassembler starts at an address that does not correspond to a valid instruction, it can be expected to re–synchronize with the sequence of valid instructions after a few steps [13].

Initial control flow graph Figure 4: Initial control flow graph

The initial control flow graph that is created by Algorithm 1 for our example function is shown in Figure 4. In this example, the algorithm is invoked for the function start at address 0x8048000 and the four jump candidates (0x8048006, 0x804800c, 0x8048010, and 0x8048017). The nodes in this figure represent basic blocks and are labeled with the start address of the first instruction and the end address of the last instruction in the corresponding instruction sequence. Note that the end address denotes the first byte after the last instruction and is not part of the basic block itself. Solid, directed edges between nodes represent the targets of control transfer instructions. A dashed line between two nodes signifies a conflict between the two corresponding blocks. Two basic blocks are in conflict when they contain at least one pair of instructions that overlap. As discussed previously, our algorithm guarantees that a certain instruction is assigned to at most one basic block (otherwise, blocks are split appropriately). Therefore, whenever the address ranges of two blocks overlap, they must also contain different, overlapping instructions. Otherwise, both blockswould contain the same instruction, which is not possible. This is apparent in Figure 4, where the address ranges of all pairs of conflicting basic blocks overlap. To simplify the following discussion of the techniques used to resolve conflicts, nodes that belong to the real control flow graph are shaded. In addition, each node is denoted with an uppercase letter.

4.2.2 Block Conflict Resolution

The task of the block conflict resolution phase is to remove basic blocks from the initial CFG until no conflicts are present anymore. Conflict resolution proceeds in five steps. The first two steps remove blocks that are definitely invalid, given our assumptions. The last three steps are heuristics that choose likely invalid blocks. The conflict resolution phase terminates immediately after the last conflicting block is removed; it is not necessary to carry out all steps. The final step brings about a decision for any basic block conflict and the control flow graph is guaranteed to be free of any conflicts when the conflict resolution phase completes.

The five steps are detailed in the following paragraphs.

Step 1: We assume that the start address of the analyzed function contains a valid instruction. Therefore, the basic block that contains this instruction is valid. In addition, whenever a basic block is known to be valid, all blocks that are reachable from this block are also valid.

A basic block v is reachable from basic block u if there exists a path p from u to v. A path p from u to v is defined as a sequence of edges that begins at u and terminates at v. An edge is inserted into the control flow graph only when its target can be statically determined and a possible program execution trace exists that transfers control over this edge. Therefore, whenever a control transfer instruction is valid, its targets have to be valid as well.

We tag the node that contains the instruction at the function's start address and all nodes that are reachable from this node as valid. Note that this set of valid nodes contains exactly the nodes that a traditional recursive disassembler would identify when invoked with the function's start address. When the valid nodes are identified, any node that is in conflict with at least one of the valid nodes can be removed.

In the initial control flow graph for the example function in Figure 4, only node A (0x8048000) is marked as valid. That node is drawn with a stronger border in Figure 4. The reason is that the corresponding basic block ends with a call instruction at 0x8048003 whose target is not local. In addition, we do not assume that control flow resumes at the address after a call and thus the analysis cannot directly continue after the call instruction. In Figure 4, node B (the basic block at 0x8048006) is in conflict with the valid node and can be removed.

Step 2: Because of the assumption that valid instructions do not overlap, it is not possible to start from a valid block and reach two different nodes in the control flow graph that are in conflict. That is, whenever two conflicting nodes are both reachable from a third node, this third node cannot be valid and is removed from the CFG. The situation can be restated using the notion of a common ancestor node. A common ancestor node of two nodes u and v is defined as a node n such that both u and v are reachable from n.

In Step 2, all common ancestor nodes of conflicting nodes are removed from the control flow graph. In our example in Figure 4, it can be seen that the conflicting node F and node K share a common ancestor, namely node J. This node is removed from the CFG, resolving a conflict with node I. The resulting control flow graph after the first two steps is shown in Figure 5.

CFG after two steps of conflict resolution Figure 5: CFG after two steps of conflict resolution

The situation of having a common ancestor node of two conflicting blocks is frequent when dealing with invalid conditional branches. In such cases, the branch target and the continuation after the branch instruction are often directly in conflict, allowing one to remove the invalid basic block from the control flow graph.

Step 3: When two basic blocks are in conflict, it is reasonable to expect that a valid block is more tightly integrated into the control flow graph than a block that was created because of a misinterpreted argument value of a program instruction. That means that a valid block is often reachable from a substantial number of other blocks throughout the function, while an invalid block usually has only a few ancestors.

The degree of integration of a certain basic block into the control flow graph is approximated by the number of its predecessor nodes. A node u is defined as a predecessor node of v when v is reachable by u. In Step 3, the predecessor nodes for pairs of conflicting nodes are determined and the node with the smaller number is removed from the CFG.

In Figure 5, node K has no predecessor nodes while node F has five. Note that the algorithm cannot distinguish between real and spurious nodes and thus includes node C in the set of predecessor nodes for node F. As a result, node K is removed. The number of predecessor nodes for node C and node H are both zero and no decision is made in the current step.

Step 4: In this step, the number of direct successor nodes of two conflicting nodes are compared. A node v is a direct successor node of node u when v can be directly reached through an outgoing edge from u. The node with less direct successor nodes is then removed. The rationale behind preferring the node with more outgoing edges is the fact that each edge represents a jump target within the function and it is more likely that a valid control transfer instruction has a target within the function than any random CTI.

In Figure 5, node C has only one direct successor node while node H has two. Therefore, node C is removed from the control flow graph. In our example, all conflicts are resolved at this point.

Step 5: In this step, all conflicts between basic blocks must be resolved. For each pair of conflicting blocks, one is chosen at random and then removed from the graph. No human intervention is required at this step, but it would be possible to create different alternative disassembly outputs (one output for each block that needs to be removed) that can be all presented to a human analyst.

It might also be possible to use statistical methods during Step 5 to improve the chances that the «correct» block is selected. However, this technique is not implemented and is left for future work.

The result of the conflict resolution step is a control flow graph that contains no overlapping basic blocks. The instructions in these blocks are considered valid and could serve as the output of the static analysis process. However, most control flow graphs do not cover the function's complete address range and gaps exist between some basic blocks.

4.3 Gap Completion

The task of the gap completion phase is to improve the results of our analysis by filling the gaps between basic blocks in the control flow graph with instructions that are likely to be valid. A gap from basic block b1 to basic block b2 is the sequence of addresses that starts at the first address after the end of basic block b1 and ends at the last address before the start of block b2, given that there is no other basic block in the control flow graph that covers any of these addresses. In other words, a gap contains bytes that are not used by any instruction in the control flow graph.

Gaps are often the result of junk bytes that are inserted by the obfuscator. Because junk bytes are not reachable at run–time, the control flow graph does not cover such bytes. It is apparent that the attempt to disassemble gaps filled with junk bytes does not improve the results of the analysis. However, there are also gaps that do contain valid instructions. These gaps can be the result of an incomplete control flow graph, for example, stemming from a region of code that is only reachable through an indirect jump whose target cannot be determined statically. Another frequent cause for gaps that contain valid instructions are call instructions. Because the disassembler cannot continue after a call instruction, the following valid instructions are not immediately reachable. Some of these instructions might be included into the control flow graph because they are the target of other control transfer instructions. Those regions that are not reachable, however, cause gaps that must be analyzed in the gap completion phase.

The algorithm to identify the most probable instruction sequence in a gap from basic block b1 to basic block b2 works as follows. First, all possibly valid sequences in the gap are identified. A necessary condition for a valid instruction sequence is that its last instruction either (i) ends with the last byte of the gap or (ii) its last instruction is a non intra–procedural control transfer instruction. The first condition states that the last instruction of a valid sequence has to be directly adjacent to the first instruction of the second basic block b2. This becomes evident when considering a valid instruction sequence in the gap that is executed at run–time. After the last instruction of the sequence is executed, the control flow has to continue at the first instruction of basic block b2. The second condition states that a sequence does not need to end directly adjacent to block b2 if the last instruction is a non intra–procedural control transfer. The restriction to non intra–procedural CTIs is necessary because all intra–procedural CTIs are included into the initial control flow graph. When an intra–procedural instruction appears in a gap, it must have been removed during the conflict resolution phase and should not be included again.

Instruction sequences are found by considering each byte between the start and the end of the gap as a potential start of a valid instruction sequence. Subsequent instructions are then decoded until the instruction sequence either meets or violates one of the necessary conditions defined above. When an instruction sequence meets a necessary condition, it is considered possibly valid and a sequence score is calculated for it. The sequence score is a measure of the likelihood that this instruction sequence appears in an executable. It is calculated as the sum of the instruction scores of all instructions in the sequence. The instruction score is similar to the sequence score and reflects the likelihood of an individual instruction. Instruction scores are always greater or equal than zero. Therefore, the score of a sequence cannot decrease when more instructions are added. We calculate instruction scores using statistical techniques and heuristics to identify improbable instructions.

The statistical techniques are based on instruction probabilities and digraphs. Our approach utilizes tables that denote both the likelihood of individual instructions appearing in a binary as well as the likelihood of two instructions occurring as a consecutive pair. The tables were built by disassembling a large set of common executables and tabulating counts for the occurrence of each individual instruction as well as counts for each occurrence of a pair of instructions. These counts were subsequently stored for later use during the disassembly of an obfuscated binary. It is important to note that only instruction opcodes are taken into account with this technique; operands are not considered. The basic score for a particular instruction is calculated as the sum of the probability of occurrence of this instruction and the probability of occurrence of this instruction followed by the next instruction in the sequence.

In addition to the statistical technique, a set of heuristics are used to identify improbable instructions. This analysis focuses on instruction arguments and observed notions of the validity of certain combinations of operations, registers, and accessing modes. Each heuristic is applied to an individual instruction and can modify the basic score calculated by the statistical technique. In our current implementation, the score of the corresponding instruction is set to zero whenever a rule matches. Examples of these rules include the following:

  • operand size mismatches;
  • certain arithmetic on special–purpose registers;
  • unexpected register–to–registermoves (e.g., moving from a register other than %ebp into %esp);
  • moves of a register value into memory referenced by the same register.

When all possible instruction sequences are determined, the one with the highest sequence score is selected as the valid instruction sequence between b1 and b2.

Gap completion and disassembler output Figure 6: Gap completion and disassembler output

The instructions that make up the control flow graph of our example function and the intermediate gaps are shown in the left part of Figure 6. It can be seen that only a single instruction sequence is valid in the first gap, while there is none in the second gap. The right part of Figure 6 shows the output of our disassembler. All valid instructions of the example function have been correctly identified.

5. Tool–Specific Techniques

The techniques discussed in the previous section can disassemble any binary that satisfies our assumptions with reasonable accuracy (see Section 6 for detailed results). As mentioned previously, however, the results can be improved when taking advantage of available tool–specific knowledge. This section introduces a modification to our general techniques that can be applied when disassembling binaries transformed with Linn and Debray's obfuscator.

A significant problem for the disassembler is the fact that it cannot continue disassembling at the address following a call instruction. As discussed in Section 2, Linn and Debray's obfuscator replaces regular calls with calls to a branch function. The branch function is responsible for determining the real call target, that is, the function that is invoked in the original program. This is done using a perfect hash function, using the location of the call instruction as input. During run–time, the location of the call instruction can be conveniently determined from the top of the stack. The reason is that the address following the call instruction is pushed on the stack by the processor as part of the x86 call operation.

Besides finding the real target of the call and jumping to the appropriate address, the branch function is also responsible for adjusting the return address such that control flow does not return directly to the address after the call instruction. This is achieved by having the branch function add a certain offset to the return address on the stack. This offset is constant (but possibly different) for each call instruction and obtained in a way similar to the target address by performing a table lookup based on the location of the caller. When the target function eventually returns using the modified address on the stack, the control flow is transfered to an instruction located at offset bytes after the original return address. This allows the obfuscator to fill these bytes with junk.

By reverse engineering the branch function, the offset can be statically determined for each call instruction. This allows the disassembler to skip the junk bytes and continue at the correct instruction. One possibility is to manually reverse engineer the branch function for each obfuscated binary. However, the process is cumbersome and error prone. A preferred alternative is to automatically extract the desired information.

We observe that the branch function is essentially a procedure that takes one input parameter, which is the address after the call instruction that is passed on the top of the stack. The procedure then returns an output value by adjusting this address on the stack. The difference between the initial value on the stack and the modified value is the offset that we are interested in. It is easy to simulate the branch function because its output only depends on the single input parameter and several static lookup tables that are all present in the binary's initialized data segment. As the output does not depend on any input the program receives during run–time, it can be calculated statically.

To this end, we have implemented a simple virtual processor as part of the disassembler that simulates the instructions of the branch function. Because the branch function does not depend on dynamic input, all memory accesses refer to addresses in the initialized data segment and can be satisfied statically. The execution environment is set up such that the stack pointer of the virtual processor points to an address value for which we want to determine the offset. Then, the simulator executes instructions until the input address value on the stack is changed. At this point, the offset for a call is calculated by subtracting the old address value from the new one.

Whenever the disassembler encounters a call instruction, the value of the address following the call is used to invoke our branch function simulator. The simulator calculates the corresponding offset, and the disassembler can then skip the appropriate number of junk bytes to continue at the next valid instruction.

6. Evaluation

Linn and Debray evaluated their obfuscation tool using the SPECint 95 benchmark suite, a set of eight benchmark applications written in C. These programs were compiled with gcc version egcs–2.91.66 at optimization level –O3 and then obfuscated.

To measure the efficacy of the obfuscation process, the confusion factor for instructions was introduced. This metric measures how many program instructions were incorrectly disassembled. More formally, let V be the set of valid program instructions and O the set of instructions that a disassembler outputs. Then, the confusion factor CF is defined as CF = V-OV . Because our work focuses on the efficacy of the disassembler in identifying valid instructions, we define the disassembler accuracy DA as DA = 1-CF.

Linn and Debray used three different disassemblers to evaluate the quality of their obfuscator. The first one was the GNU objdump utility, which implements a standard linear sweep algorithm. The second disassembler was implemented by Linn and Debray themselves. It is a recursive disassembler that uses speculative linear disassembly (comparable to our gap completion) for regions that are not reachable by the recursive part. This disassembler was also provided with additional information about the start and end addresses of all program functions. The purpose of this disassembler was to serve as an upper bound estimator for the disassembler accuracy and to avoid reporting «unduly optimistic results» [13]. The third disassembler was IDA Pro 4.3x, a commercial disassembler that is often considered to be among the best commercially available disassemblers. This belief is also reflected in the fact that IDA Pro is used to provide disassembly as input for static analysis tools such as [3].

We developed a disassembler that implements the general techniques and the tool–specific modification presented in the two previous sections. Our tool was then run on the eight obfuscated SPECint 95 applications. The results for our tool and a comparison to the three disassemblers used by Linn and Debray are shown in Table 1. Note that we report two results for our disassembler. One shows the disassembler accuracy when only general techniques are utilized. The second result shows the disassembler accuracy when the tool–specific modification is also enabled.

ProgramObjdumpLinn⁄DebrayIDA ProOur tool
generaltool‑specific
compress9556.0769.9624.1991.0498.07
gcc65.5482.1845.0988.4595.17
go66.0878.1243.0191.8196.80
ijpeg60.8274.2331.4691.6097.53
li56.6572.7829.0789.8697.35
m88ksim58.4275.6629.5690.3997.49
perl57.6672.0131.3686.9396.28
vortex66.0276.9742.6590.7196.65
Mean60.9175.2434.5590.1096.92
Table 1: Disassembler accuracy

These results demonstrate that our disassembler provides a significant improvement over the best disassembler used in the evaluation by Linn and Debray. Even without using tool–specific knowledge, the disassembler accuracy is higher than their recursive disassembler used to estimate the upper bound for the disassembler accuracy. When the tool–specific modification is enabled, the binary is disassembled almost completely. The poor results for IDA Pro can be explained with the fact that the program only disassembles addresses that can be guaranteed (according to the tool) to be instructions. As a result, many functions that are invoked through the branch function are not disassembled at all. In addition, IDA Pro continues directly after call instructions and is frequently mislead by junk bytes there.

Given the satisfying results of our disassembler, the disassembly process was analyzed in more detail. It is interesting to find the ratio between the number of valid instructions identified by the control flow graph and the number of valid instructions identified by the gap completion phase. Although the gap completion phase is important in filling regions not covered by the CFG, our key observation is the fact that the control transfer instructions and the resulting control flow graph constitute the skeleton of an analyzed function. Therefore, one would expect that most valid instructions can be derived from the control flow graph, and only small gaps (e.g., caused by indirect calls or unconditional jumps) need to be completed later. Table 2 shows the fraction (in percent) of correctly identified, valid instructions that were obtained using the control flow graph and the fraction obtained in the gap completion phase. Because the numbers refer to correctly identified instructions only, the two fractions sum up to unity. Both the results with toolspecific support and the results with the general techniques alone are provided. When tool specific support is available, the control flow graph contributes noticeable more to the output. In this case, the disassembler can include all regions following call instructions into the CFG. However, in both experiments, a clear majority of the output was derived from the control flow graph, confirming our key observation.

Programgeneraltool‑specific
CFGGapCFGGap
compress9587.0912.9196.363.64
gcc85.1214.8893.106.90
go89.1310.8795.114.89
ijpeg87.0212.9895.034.97
li85.6314.3795.114.89
m88ksim87.1812.8296.004.00
perl86.2213.7895.574.43
vortex88.0411.9694.675.33
Mean86.9313.0795.124.88
Table 2: CFG vs. gap completion

Because most of the output is derived from the control flow graph, it is important that the conflict resolution phase is effective. One third of the control transfer instructions that are used to create the initial control flow graphs are invalid. To achieve a good disassembler accuracy, it is important to remove the invalid nodes from the CFG. The first two steps of the conflict resolution phase remove nodes that are guaranteed to be invalid, given our assumptions. The third and forth step implement two heuristics and the fifth step randomly selects one of two conflicting nodes. It is evident that it is desirable to have as many conflicts as possible resolved by the first and second step, while the fifth step should never be required.

Table 3 shows for each program the number of basic blocks in the initial control flow graphs (column Initial Blocks) and the number of basic blocks in the control flow graphs after the conflict resolution phase (column Final Blocks). In addition, the number of basic blocks that were removed in each of the five steps of the conflict resolution phase are shown. The numbers given in Table 3 were collected when the tool–specific modification was enabled. The results were very similar when only general techniques were used.

ProgramInitial BlocksConflict ResolutionFinal Blocks
Step 1Step 2Step 3Step 4Step 5
compress9554674702146934242934838577
gcc245586217622568029801900565166878
go91140106678934940523115461749
ijpeg702559414606952991409549238
li634598350529749521257844657
m88ksim77344100616933693817710153134
perl10484110940114421175029115270266
vortex1187031500492211342440737380274
Table 3: Conflict resolution

It can be seen that most conflicts were resolved after the first three steps. About two thirds of the removed basic blocks were guaranteed to be invalid. This supports our claim that invalid control flow instructions, caused by the misinterpretation of instruction arguments, often result in impossible control flows that can be easily detected. Most of the remaining blocks are removed by the first heuristic that checks how tight a block is connected with the rest of the CFG. Invalid blocks are often loosely coupled and can taken out during this step. The last two steps were only responsible for a small fraction of the total removed blocks. The heuristic in step four was sometimes able to provide an indication of which block was valid. Otherwise, a random node had to be selected.

Static analysis tools are traditionally associated with poor scalability and the inability to deal with real–world input. Therefore, it is important to ascertain that our disassembler can process even large real–world binaries in an acceptable amount of time. In Section 4, we claimed that the processing overhead of the program is linear in the number of instructions of the binary. The intuitive reason is the fact that the binary is partitioned into functions that are analyzed independently. Assuming that the average size of an individual function is relatively independent of the size of the binary, the amount of work per function is also independent of the size of the binary. As a result, more functions have to be analyzed as the size of the binary increases. Because the number of functions increases linearly with the number of instructions and the work per function is constant (again, assuming a constant average function size), the overhead of the static analysis process is linear in the number of instructions.

ProgramSize (Bytes)InstructionsTime (s)
openssh263,68446,3434
compress951,768,42092,1379
li1,820,768109,6527
ijpeg1,871,776127,0129
m88ksim2,001,616127,3588
go2,073,728145,95311
perl2,176,268169,05415
vortex2,340,196204,23016
gcc2,964,740387,28928
emacs4.765,512405,53538
Table 4: Disassembler processing times

To support this claim with experimental data, the time for a complete disassembly of each evaluation binary was taken. The size of obfuscated programs of the SPECint 95 benchmark are in the range of 1.77 MB to 2.96 MB. To obtain more diversified results, we also disassembled one smaller (openssh 3.7) and one larger binary (emacs 21.3). The processing times were taken as the average of ten runs on a 1.8 GHz Pentium IV system with 512 MB of RAM, running Gentoo Linux 2.6. The results (in seconds) for the disassembler are listed in Table 4. There was no noticeable difference when using tool–specific modification.

Processing times and linear regression Figure 7: Processing times and linear regression

Figure 7 shows a plot of the processing times and the corresponding number of instructions for each binary. The straight line represents the linear regression line. The close proximity of all points to this line demonstrates that the processing time increases proportional to the number of instructions, allowing our disassembler to operate on large binaries with acceptable cost.

7. Conclusions

Correct disassembler output is crucial for many security tools such as virus scanners [3] and intrusion detection systems [11]. Recently, Linn and Debray [13] presented obfuscation techniques that successfully confuse current state–of–the–art disassemblers. We developed and implemented a disassembler that can analyze obfuscated binaries. Using the program's control flow graph and statistical techniques, we are able to correctly identify a large fraction of the program's instructions.

Obfuscation and de–obfuscation is an arms race. It is possible to devise obfuscation techniques that will make the disassembly algorithms describe in this paper less effective. However, this arms race is usually in favor of the de–obfuscator. The obfuscator has to devise techniques that transform the program without seriously impacting the run–time performance or increasing the binary's size or memory footprint while there are no such constraints for the de–obfuscator. Also, the de–obfuscator has the advantage of going second. That is, the obfuscator must resist all attacks, while the de–obfuscator can tailor the attack to a specific obfuscation technique. In this direction, a recent theoretical paper [1] also proved that obfuscation is impossible in the general case, at least for certain properties.

Acknowledgments

This research was supported by the Army Research Office under agreement DAAD19–01–1–0484 and by the National Science Foundation under grants CCR–0209065 and CCR–0238492.

References

  1. B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan, and K. Yang. On the (Im)possibility of Software Obfuscation. In Crypto, 2001.
  2. J. Bergeron, M. Debbabi, M.M. Erhioui, and B. Ktari. Static Analysis of Binary Code to Isolate Malicious Behaviors. In 8thWorkshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, 1999.
  3. M. Christodorescu and Somesh Jha. Static Analysis of Executables to Detect Malicious Patterns. In 12th USENIX Security Symposium, 2003.
  4. C. Cifuentes and M. Van Emmerik. UQBT: Adaptable binary translation at low cost. IEEE Computer, 40(2–3), 2000.
  5. C. Cifuentes and A. Fraboulet. Intraprocedural Static Slicing of Binary Executables. In International Conference on Software Maintenance (ICSM'97), Bari, Italy, October 1997.
  6. C. Cifuentes and K. Gough. Decompilation of Binary Programs. Software Practice & Experience, 25(7):811–829, July 1995.
  7. F. B. Cohen. Operating System Protection through Program Evolution. http://all.net/books/IP/evolve.html.
  8. C. Collberg and C. Thomborson. Watermarking, Tamper–Proofing, and Obfuscation — Tools for Software Protection. IEEE Transactions on Software Engineering, 28(8):735–746, August 2002.
  9. C. Collberg, C. Thomborson, and D. Low. A Taxonomy of Obfuscating Transformations. Technical Report 148, Department of Computer Science, University of Auckland, July 1997.
  10. Free Software Foundation. GNU Binary Utilities, Mar 2002. http://www.gnu.org/software/binutils/manual/.
  11. J.T. Giffin, S. Jha, and B.P. Miller. Detecting manipulated remote call streams. In 11th USENIX Security Symposium, 2002.
  12. W.C. Hsieh, D. Engler, and G. Back. Reverse–Engineering Instruction Encodings. In USENIX Annual Technical Conference, pages 133–146, Boston, Mass., June 2001.
  13. C. Linn and S. Debray. Obfuscation of executable code to improve resistance to static disassembly. In 10th ACM Conference on Computer and Communications Security (CCS), pages 290–299, October 2003.
  14. T. Ogiso, Y. Sakabe, M. Soshi, and A. Miyaji. Software obfuscation on a theoretical basis and its implementation. IEICE Transactions on Fundamentals, E86–A(1), 2003.
  15. R. Sites, A. Chernoff, M. Kirk, M. Marks, and S. Robinson. Binary Translation. Digital Technical Journal, 4(4), 1992.
  16. Symantec. Understanding and Managing Polymorphic Viruses. http://www.symantec.com/avcenter/whitepapers.html.
  17. G. Wroblewski. General Method of Program Code Obfuscation. In Proceedings of the International Conference on Software Engineering Research and Practice (SERP), Las Vegas, NV, June 2002.
Christopher Kruegel, William Robertson, Fredrik Valeur and Giovanni Vigna. Reliable Software Group. University of California Santa Barbara. {chris,wkr,fredrik,vigna}@cs.ucsb.edu

Saturday, April 26, 2008

Obfuscation, Weird Languages, and Code Aesthetics

The standard idea of code aesthetics, when such an idea manifests itself at all, allows for programmers to have elegance and clarity as their standards. This paper explores programming practices in which other values are at work, showing that the aesthetics of code must be enlarged to accommodate them. The two practices considered are obfuscated programming and the creation of «weird languages» for coding. Connections between these two practices, and between these and other mechanical and literary aesthetic traditions, are discussed.

1. Introduction

Programmers write code in order to cause the computer to function in desired ways. But modern computer programs are written in a form, usually textual, that is also meant to be manipulable and understandable by human beings. For a programmer to understand what she herself is writing, and to incorporate code that others have written, and to simply learn how to program with greater facility and on a larger, more complex scale, code has been made legible to people. While a computer system may compile or interpret code, it is important to the nature of code that it is interpreted by people as well.

A typical perspective on code would be that clarity and elegance are the only possible values that programmers can have when writing it, although they may succeed to a greater or lesser extent at writing clear and elegant code. But if this were the case, how is it possible to explain the way that people sometimes intentionally obfuscate their code, making its functioning more or less impenetrable, even when there is no commercial or practical reason to do so? The existence of obfuscated programming as a software development practice, and as an aesthetic practice, throws a wrench into the simplified theory of coding that would claim that coders must always strive for clarity. An additional complication is seen in programming languages that are themselves designed as jokes or parodies, sometimes called «weird programming languages» or «esoteric programming languages.» Such languages are designed to make legibility of any program difficult. Obfuscated code and weird languages highlight the importance of the human reading of code in software development. If some code is only to be read by a machine, it can be neither obfuscated nor clear: it can only function properly or not.

This paper suggests some ways to enlarge an aesthetics of code to account for the existence of obfuscated programming and «weird languages». Such consideration shows that a previously neglected layer of computing and new media is available for rich aesthetic understanding.

2. Reading Code

Version 2.1 of the online lexical reference system WordNet gives 11 senses for «read», including «look at, interpret, and say out loud something that is written or printed» and «interpret the significance of, as of palms, tea leaves, intestines, the sky, etc.; also of human behavior». [14] This discussion is about a fairly literal application of the most common sense, «interpret something that is written or printed», although of course code that appears on a screen (rather than being written or printed out) can also be read.

The understanding of behavior is certainly involved in reading code in the primary sense of «read», however. It is essential to any ordinary human reading of a computer program to develop an understanding of how the computer will behave, and what it will compute, when it runs the code that is being examined. In a popular book on the history of software, one of the developers of FORTRAN is characterized as «an extraordinary programmer who could "execute" a program in his head, as a machine would, and then write error–free code with remarkable frequency». [7] Actually, all programmers must do this to some extent, using some internal model of what code will do. Just as understanding what a program does, and why, is critical on a practical level for the programmer, it is important to the aesthetics of code as well. Because code functions, «the aesthetic value of code lies in its execution, not simply its written form. To appreciate it fully we need to "see" the code to fully grasp what it is we are experiencing and to build an understanding of the code's actions». [2]

The analysis of a computer program or system often involves examining how the program behaves and «reading» (in this other sense, «interpreting the significance of») the intention behind the program, the structure of the program, or the more fundamental causes for the outputs observed. This is very frequently done in reverse–engineering in «black–box» situations, where code and other internals are not available for inspection. A network administrator might also be able to «read» the behavior of a malfunctioning router and figure out the problem without looking at any code. But these types of analysis also apply to systems that are not governed by legible code at all, and are not, by themselves, examples of the phenomenon under consideration, the human reading and interpretation of particular texts, computer programs.

Reading in the main sense is about looking at something abstract. «Reading a photograph» sounds odd, perhaps because the photograph is not printed matter but also because it represents a framed perspective rather directly, with little abstraction. It is much more usual to read a diagram or map, because these are abstract representations. The development of software brought code into a legible condition. Cables patched into the ENIAC were not themselves legible, but assembly language for the stored–program EDSAC was. Human readability of programs was further enhanced as high–level programming languages, beginning with FORTRAN, were developed.

int i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\
o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);}

// Figure 1. An anonymous entry to the 1984 International Obfuscated C Code Contest that prints «hello, world!»

In the question and answer period after a lecture, Donald Knuth, the famous computer scientist who is author of The Art of Computer Programming, recalls reading the program SOAP from Stan Poley: «absolutely beautiful. Reading it was just like hearing a symphony, because every instruction was sort of doing two things and everything came together gracefully». He also remembers reading the code to a compiler written by Alan Perlis and others: «plodding and excruciating to read, because it just didn't possess any wit whatsoever. It got the job done, but its use of the computer was very disappointing». Knuth says of the aesthetics of reading programs and the reader's pleasure: «I do think issues of style do come through and make certain programs a genuine pleasure to read. Probably not, however, to the extent that they would give me any transcendental emotions». [6]

This discussion is not about any sentimental effects that code may have on the human reader, but does consider in detail the issues of programming style and the ways in which human readers read code. An aesthetic of code is suggested by Knuth's comments, one that is typified by beauty and grace and is clearly identified by Maurice Black in his dissertation, «The Art of Code»:

Computing culture ... has adopted a traditional model of literary aesthetics as a means of effecting change, finding political utility and social value in the wellcrafted product that is at once entirely usable and wholly beautiful to contemplate. The distinctions are clearly evident in the respective disciplines' discourses: whereas terms such as «elegant» and «beautiful» circulate freely in computer culture to describe wellcrafted code, elegance, beauty, and all their synonyms have been effectively exiled from the vocabulary of literary and cultural theory ... [1]

Black devotes a section to Knuth's aesthetic views and his concept of «literate programming,» and another section to John Lions's book–length commentary on the beautiful, elegant Unix operating system. «The Art of Code» clearly establishes the classical aesthetic of programming as the dominant one in the discourse of software development. More recent articles, such as one entitled «Beautiful Code» that appeared in Dr. Dobbs, show that this aesthetic is still going strong: «Instead of searching for some automated measure ... perhaps we should be striving for beauty in our work because we believe that beautiful things are better.» [3] It is fairly easy to find programmers extolling the beauty of programs and code snippets online, and also easy to find suggestions for writing elegant, clearly–written code in introductory programming textbooks.

There is a dark side to coding, however, one in which, even though a person can see into what would otherwise be the black box of the program, the source code itself is obscure, contrived to foil human legibility rather than enhance it.

3. Hello, Obfuscation

In 1984 Landon Curt Noll and Larry Bassel held the first International Obfuscated C Code Contest. The contest was a success that has been repeated many times; judging of the 18th IOCCC was underway when this article was written. Only small, complete C programs can be entered in the contest, which rewards originality and the aesthetic abuse of the C language. The contest's stated goals include demonstrating the importance of programming style («in an ironic way») and illustrating &laq