Karla News

What is Code Obfuscation?

Patent laws, copyright laws, hackers and plagiarists make it very hard for programmers to exercise their Intellectual Property Rights over the code they have created. Hence the need for obfuscation. Some codes are easier to obfuscate than others. In this paper several techniques for the technical protection of software secrets are described and reviewed. Also, in this paper data obfuscation, the class of obfuscation techniques that zero in on the data structures in the software, is discussed. This paper finally discusses some possible deobfuscation techniques (such as program slicing).

When a programmer spends hours writing code, its only logical that such code be patented. But copyright laws and patent laws (uspto.gov), do not serve all the needs of a programmer. One way of preventing plagiarism and hacking is to employ code obfuscation.

Defining obfuscation
Here is a definition: Obfuscated code (Wikipedia, is that is (perhaps intentionally) very hard to read and understand. Some languages are more prone to obfuscation than others. are most often cited as easily “obfuscatable” languages. Macro preprocessors are often used to create hard to read code by masking the standard language syntax and grammar from the main body of code.

Code obfuscation : Some methods described
Obfuscation methods are classified based on the information they focus on The basic ones are aimed at the lexical structure of the code. Others , as mentioned before, zero in on the data structures. The control flow is also a major area of focus.. Obfuscation methods are further classified based on the kind of operation they perform on the targeted information. The different obfuscation methods (Collberg et al,) are:

Layout obfuscation: Aims at the layout of the application, such as the formatting of the source code formatting,names of the variable names and finally the comments.

Data obfuscation: A widely used method,it zeroes in on the data structures used by the code.

Encoding obfuscation: Changes the way in which the machine interpret stored data For example, consider the replacement of a variable ” i ” by a derived value c1*i +c2. Aggregation obfuscation: Alters the groups of data brought created by the code. Here, consider an example of splitting an array into several sub-arrays.Also, for that matter, an variable that is not geared to handle an array is made to look as if it is an array. This dummy throws reverse engineering efforts off-track. Ordering obfuscation: Changes the order of the data . For example, reordering the order of the array elements and placing the element “i” in a new location ascertained by a function f(i) is one way of obfuscation.

Control flow obfuscation: While some methods of obfuscation alter the aggregation of control or data, others influence the ordering. This type of obfuscation aims to interfere with the unraveling of code via the manipulation of the control flow of the code. Aggregation obfuscation: Alters how statements are grouped together. An example is in-lining, which means replacing a function call by the body of the function.

See also  Best Programming Language for Beginners

Changes in the order in which statements are executed is another way of obfuscation. An example is setting up a loop such that it iterates backwards successfully prevents hacking, plagiarism etc. (S. Chow et al., But it’s not watertight, since it’s very easy to reverse. Computational obfuscation: The control flow in a code is altered , for example, by inserting object level code that has no source code equivalent (Low, Meaningless chunks of code inserted strategically succeed in confounding potential plagiarists.)

Preventive transformation: This method of obfuscation deters the Plagiarist or the hacker by making the code very hard to break using a de-obfuscator.The main goal of this method is not to obscure the code but to make it more difficult to break for the de-obfuscators (Chang, Targeted: Tries to make automatic de-obfuscation techniques more difficult.
Inherent:Until now, the meothods reviewed pointed to equpping the code itself in such a way that it makes it hard to understand or de-obfuscate. Inherent obfuscation depends on exploiting the weakness of de-obfuscating programs

A list of types of obfuscation.
Inherent obfuscation.
Layout obfuscation
Data obfuscation
Control obfuscation
Preventive transformation
Ordering obfuscation
Storage obfuscation
Aggregation obfuscation
Computation obfuscation
Encoding obfuscation

Parameters for evaluating quality of an obfuscation method
The object of obfuscation is to discourage code-piracy. Therefore the effectiveness of an obfuscation strategy is “directly proportional” to the complexity of the post-obfuscated code. To study obfuscation methods in detail an appraisal of the quality of the transformation is necessary. The efficacy of the method of obfuscation employed is determined by a combination of its power or potency, resilience, stealth and cost. These are the elements of the framework necessary for the appraisal of the quality of an obfuscation method.

What is Potency:
Potency defines the extent to which the transformed code is more obscure compared to the original code. Code or software intricacy metrics define various complexity measures for software some metrics are:the number of predicates contained, the depth of its inheritance tree, nesting levels, etc. The aim here is to maximize intricacy.

Resilience: Resilience is the strength of a transformation method. It is a measure of how well a code can with stand the automated deobfuscation methods following transformation.. It is an alamgam of the programmer’s effort to create a de-obfuscator and the time and space taken up by the deobfuscator. The maximum resilience is seen in the case of a one-way transformation that cannot be undone by a deobfuscator. Uni-directionality confers irreversibility(?) When the obfuscation method removes information like code formatting, to consider an example.

See also  You Can Make Money with Your Smartphone

A transformation is said to be powerful and potent if it can throw a reader off track, whereas it is resilient if an automated, machine based processed(like a deobfuscator tool) cannot undo the obfuscation. This is how the potency differs from resilience, as far as obfuscation is concerned.

Following an obfuscation procedure, the time and space requirements of a software or a code may change. The execution time and space requirements following transformation, are the costs associated with obfuscation. Cost changes with the context. This transformation is very powerful since it increases the complexity, but it has no resilience since it can easily be undone by a an automated de-obfuscator.

Stealth: When reverse engineers come across portions of code that stand out, then its easy for them to use de-obfuscation methods and focus on sections that require work. However, if areas that have been obfuscated blend in with the rest of the code, its hard for reverse engineers to employ de-obfuscation methods Automated de-obfuscators , of course, will fail. This ability to blend seamlessly into the rest of the code and be undetectable is known as “Stealth”.

There are a group of techniques of obfuscation that target the different parts of a program i.e., data structures like arrays, classes or variables. It is important to list the different methods of obfuscation with examples and to analyze their quality based on the restrictions/boundaries mentioned before. These parameters, as mentioned above, are a) potency b) resilience and c) cost (see Table.1 for an evaluation of these methods).

De-obfuscation
A reverse engineer can use knowledge of the strategies employed by known obfuscators to identify opaque predicates or via pattern matching. To thwart attempts at pattern matching the obfuscator should avoid using canned opaque constructs (Collberg et al., ACM It is also important to choose opaque constructs that are syntactically similar to the constructs used in the real application.

Table1.
Evaluation of obfuscation techniques Identification by Program Slicing. The basic premise of this paper is that a programmer will find the obfuscated version of a program more obscure and thus harder to de-obfuscate than the original one.
The main reasons are that in the obfuscated program
a) Live real code will be interspersed with dead bogus code and
b) Logically related pieces of code will have been broken up and dispersed over the program.

Program slicing tools can be used by a reverse engineer to counter these obfuscations. Such tools can interactively aid the engineer to decompose a program into manageable chunks called “slices”. A slice of a program P with respect to a point “p” and a variable “v” consists of all the statements of “p” that could have contributed to “v” ‘s value at p. Hence a program-slicer would be able to extract from the obfuscated program the statements of the algorithm that computes an opaque variable “v” even if the obfuscator has dispersed these statements over the entire program.

See also  Hott Notes - the Features Windows Sticky Notes Wished it Had

Obfuscation: some disadvantages discussed.
Monolayer security ( as opposed to multi-layer)
One must understand that obfuscators do not come with any guarantees. They do not provide any assurance about the level of difficulty faced when reverse engineering obfuscated code. Obfuscation does not always provide sophisticated solutions like today’s encryption. schemes. Therefore, one should not depend on obfuscation alone, should security be a big concern.
.
Code debugging, maintenance.
The expectations and the output requirements from a code might change over time. All code, at one point or the other, requires debugging. Either way, obfuscation makes it difficult to change any part of the code. After obfuscation, the control flow, the structure, the variable names and finally even the byte code( code morphing) change beyond recognition. Faced with this most developers generally reserve the un-obfuscated code for their own use. The obfuscated code is released to the public. This method of having two separate builds ensures that debugging and maintenance are still possible. Of course, the output from both the codes should be identical. Java, being and intermediate language, works from the compiled assembly, and not from the source code. Hence Java programmers do not face this limitation during obfuscation.

Conclusion
While there are ways of reverse engineering obfuscated code, such methods are not cost-effective. This makes code obfuscation a very effective tool against piracy and plagiarism.

References
H. Chang and M. Atallah. Protecting Software Code by Guards. In Sander [San02], pages C. Collberg, C. Thomborson, and D. Low. Taxonomy of Obfuscating Transformations. Technical Report Department of Computer Science, the University of Auckland, New Zealand, July S. Chow, Y. Gu, H. Johnson, and V. Zakharov. An Approach to the Obfuscation of Control-Flow of Sequential Computer Programs. In G. Davida and Y. Frankel, editors, Information Security ISC volume of Lectures Notes in Computer Science (LNCS), pages Springer-Verlag, Douglas Low Protecting Java Code via Code Obfuscation,
Department of Computer Science, University of Auckland. ACM Crossroads, Spring issue.
C. Collberg, C. Thomborson, and D. Low. Breaking Abstraction and Unstructuring Data Structures. In IEEE International Conference on Computer Languages (ICCL C. Collberg, C. Thomborson, and D. Low, “Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs,” ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, Uspto website http://us.f603.mail.yahoo.com/ym/ShowLetter?box=Sent&MsgId;=4383_0_154814_642_33921_0_32369_113891_353197913_oSObkYn4Ur5HQV3r2mDutECd4kCHd7eCv.2aPlHy00mGxnNP2JZQLA357IiM5g2H6TINsViPjlrI0q.mZqJ2cLfHI2qbbiWnZdiUcl56N37s5UUKMV4regd.eoZ7HUoDoBwRz4bxLfiVcwvALfc http://us.f603.mail.yahoo.com/ym/ShowLetter?box=Sent&MsgId;=4383_0_154814_642_33921_0_32369_113891_353197913_oSObkYn4Ur5HQV3r2mDutECd4kCHd7eCv.2aPlHy00mGxnNP2JZQLA357IiM5g2H6TINsViPjlrI0q.mZqJ2cLfHI2qbbiWnZdiUcl56N37s5UUKMV4regd.eoZ7HUoDoBwRz4bxLfiVcwvALfc

Reference: