Meme (Dumb) Ideas for Java Bytecode Constant Obfuscation

Not too long ago, I took a quick look at some of the activity that has been going on in the Java bytecode obfuscation/deobfuscation communities. So far, I have noticed that most of the same ideas since I have went inactive have remained the same:

  • Flow obfuscation:
    • opaque predicates;
    • reordering blocks through goto;
    • weird try-catch block flow;
    • callstack-sensitive keys used for branching;
    • CFG flattening;
    • complicate existing jumps;
    • etc.
  • Constant obfuscation:
    • encrypt strings via context-sensitive keys;
    • split numerical constants into a ton of arithmetic;
    • abuse constantdynamic;
    • etc.
  • Reference obfuscation:
    • abuse the Reflection API;
    • abuse invokedynamics;
    • proxying of method and field invocations;
    • changing all parameter types to java.lang.Object;
    • etc.
  • Exploits:
    • HTML-injection of vulnerable tools;
    • fake directories;
    • tool-specific crashers;
    • etc.
  • And the usual other stuff like class encryption and math obfuscation and whatnot.

Now by no means am I saying that the work done in these areas isn’t impressive. In fact, some of the recent work done in obfuscation and deobfuscation of Java bytecode is pretty impressive to me, especially considering how annoying (for both parties) it is to have to work with JVM verification. What did surprise me, however, I noticed that not a lot more effort has gone into novelizing ideas in obfuscation of constants (or maybe there has but I just didn’t look hard enough…). In any case, my goal in this post is to give some (meme) ideas for obfuscation of constants in hopes that people push that area a little bit more.

Current Methods of Constant Obfuscation (to my knowledge)

At this point, string encryption has realistically gotten about as advanced as we’re gonna have asides from people coming up with transformers that dynamically generate encryption/decryption routines. This is actually not that difficult to do, however, and there is plenty of example tooling that exists for this exact purpose. What has not really advanced, at least from my brief look, is obfuscation of numerical constants.

Generally speaking, the current approach of numerical constant obfuscation is to take the constant and replace it with a gigantic mess of arithmetic and bitwise operations that evaluate to the constant. Of course, there are reasons why people do not push this much further:

  • For the most part, arithmetic and especially bitwise are decently fast but there is visible slowdown prior to JIT-optimization (which may or may not happen) if the code is constant-heavy. More sophisticated approaches will result in worse slowdown.

  • Coming up with novel obfuscation techniques is actually difficult.

  • Constant obfuscation does not fair well against emulation and dynamic reverse-engineering attempts. This is also another reason why it’s not really worth the effort coming up with novel obfuscation techniques for constant obfuscation.

Proposed Idea

All three of the points I gave above are valid reasons not to invest (i.e. waste) time on constant obfuscation. Nonetheless, I want to propose something that, when applying sparingly and appropriately, may avoid just enough overhead to be useable. My proposition is to incorporate results from the mathematical field of numerical analysis to approximate solutions to problems where the approximate solutions are the constants of the problem.

Many problems in mathematics are really difficult and many numerical approximations are not the most obvious things to come up with. This makes static analysis more involved than usual because finding the constants by hand essentially involves knowing how to solve the problem (either symbolically or numerically). Furthermore, identifying numerical algorithms by their code is not necessarily an easy thing to do. The advantage of this is that one no longer really has to spend time coming up with a novel obfuscation idea but rather just looking at something like a partial differential equation and ways to numerically approximate its solution and boom, there’s a new obfuscation technique.

Example 1

Suppose we want to obfuscate the constants $3$, $200$, and $700$. A simple problem we might do is make them the eigenvalues of a matrix. One way one might do this is by starting with the matrix

\[\begin{aligned} \begin{bmatrix} 3 \\ & 200 \\ & & 700 \end{bmatrix}. \end{aligned}\]

Let us call this matrix $D$. We will then conjugate it by a nonsingular matrix that is not nearly singular (for numerical stability). For instance, I decided to make my nonsingular matrix $V$ be

\[\begin{aligned} \begin{bmatrix} 1 & 3 & 9\\ 8 & 2 & 1\\ 1 & 2 & 3 \end{bmatrix}. \end{aligned}\]

Then, in some way or another, we will save the result $A = VDV^{-1}$ in Java bytecode. For the record

\[\begin{aligned} A \approx 1e{+}03 \cdot \begin{bmatrix} 1.2199 & 0.0447 & -1.5745 \\ 0.0114 & -0.0243 & 0.2072 \\ 0.3313 & -0.0045 & -0.2925. \end{bmatrix}. \end{aligned}\]

By definition, the eigenvalues of this matrix are the constants we are looking for. The question, then is how do we find the eigenvalues of $A$. One thing to note is that $A$ is nonsymmetric, so we might consider using a method as described in Demmel’s Applied Numerical Linear Algebra chapter 4. The routines there are explained pretty well and programming them into Java should not be difficult. The intent is also not likely to be obvious to those not trained in numerical linear algebra.

Example 2

The Poisson equation is the elliptic PDE

\[\begin{aligned} L(u) = f \end{aligned}\]

where $L$ is the Laplacian, and $u$ and $f$ are real or complex-valued on an appropriate manifold. For simplicity, we will stick with $u$ and $f$ being real-valued and setting the domain to something simple like $[-1, 1]^2$ as a subset of $\RR^2$. Then the PDE becomes

\[\begin{aligned} \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = f(x, y), \end{aligned}\]

along with some appropriate boundary condition. Let’s say the homogeneous Dirichlet condition. That is,

\[\begin{aligned} \left. u \right|_{\partial[-1, 1]} \equiv 0. \end{aligned}\]

In order to utilize this for obfuscation of constants, we somehow need to incorporate the constants into a solution $u(x, y)$ of the PDE above with our specified boundary condition. Clever choices of $u(x, y)$, therefore producing $f(x, y)$ produces a mathematical problem that is tedious to solve by hand but not too bad by numerics. According to Iserles in A First Course in the Numerical Analysis of Differential Equations, sections 8.2 and 15.1, we can discretize the Laplacian and utilize the Fast Fourier Transform along with Hockey’s method to produce a fast solver of the PDE above. As none of these things are trivial, especially in undocumented code (as would be the case with obfuscated code), static analysis becomes much less obvious.

Example 3

In numerical analysis, significant attention is given to techniques of approximating definite integrals. That is, given continuous $f:[a, b] \to \RR$, can you find a way to approximate

\[\begin{aligned} \int_a^b f(x) \, dx? \end{aligned}\]

How quickly does the error of your method converge to $0$? These are questions asked in the study of quadrature. This is hardly a trivial question! One way of going about this is to approximate by sampling $f$ at points and weighting accordingly:

\[\begin{aligned} \int_a^b f(x) \, dx \approx \sum_{n=1}^N w_n f(x_n). \end{aligned}\]

Obviously, then, one must figure out reasonable $w_n$ to use. Typically, people come up with these by coming up with specific kinds of choices of $f(x)$ for which the approximation should be exact (look up Newton-Cotes, for instance) or the error in approximation should be as close to $0$ as possible (equivalently, minimizing the error). One way to go about this is to minimize the square of the error using Newton’s method (or simplex method combined with Newton’s) on the derivative of the squared error function. As reading this in undocumented code is far from trivial, the intent is not obvious if implemented in obfuscation. Thus, what one can do is turn constants into simple definite integrals to compute via quadrature whose rule is found by the method above (ideally, the rule is computed from within the obfuscated program itself at runtime).

Wrapping Up!

The examples given above are simple examples (though, hardly simple to implement) of using ideas of approximation as means of constant obfuscation. Personally, I believe that approaching obfuscation in this way also gives a red-herring effect in the sense that it pads the program with code that is possibly important-looking when in reality it is an algorithm from numerical analysis designed to reliably approximate a solution to a mathematical problem. The obvious tradeoff is that appropximation errors can happen which I leave as an exercise to work around. One will also run into performance issues but that is moreso a matter of using obfuscation where appropriate. Other places to look for similar inspiration are:

  • Generating functions (see Genertingfunctionology by Wilf)
  • Numerical solutions of IVPs (see A First Course in the Numerical Analysis of Differential Equations by Iserles)
  • Condition numbers of matrices (see Applied Numerical Linear Algebra by Demmel)
  • Krull dimension of rings
  • Harmonic function theory (see Harmonic Function Theory by Axler, Bourdon, and Ramey)
  • Harmonic Analysis (see any Fourier/Harmonic analysis book)
  • Betti numbers (see An Introduction to Algebraic Topology by Rotman)
  • Topological Data Analysis
  • A number of results from complex analysis

Comments