"Practical Android Software Protection in the Wild"

"Practical Android Software Protection in the Wild" - An Appetizer
This article describes the main software protecti 2026-6-3 22:0:0 Author: blog.quarkslab.com(查看原文) 阅读量:0 收藏

This article describes the main software protection techniques used in Android applications, organized around a taxonomy covering environment checks, obfuscation, and program loading abuse. It presents the results of a large-scale analysis of nearly 2.5 million Android apps, studying how widely these protections are adopted across different markets, app categories, and malware samples.

If you work in Android analysis, you have probably gotten your hands dirty with APK reversing: unzip the package, decompile it with JADX, browse the recovered Java code, and maybe pair it with some dynamic analysis using Frida. Most of the time this works smoothly, but occasionally you run into something that fights back, a packer that prevents access to the DEX, an obfuscator that makes the code unreadable, or a protector that actively blocks dynamic analysis. This post reviews the anti-analysis mechanisms and software present in the Android ecosystem, covering the threat model, the protection techniques, a taxonomy with a naming convention that analysts can use as a reference, and finally an analysis of the protections found in a dataset of around 2.5 million Android apps.

This post is based on my last PhD paper at Universidad Carlos III de Madrid: Practical Android Software Protection in the Wild, published in 2025 together with my supervisor Professor Juan Tapiador. The paper is a 32-page academic survey; here I try to cover the same ground in a more readable way, going deep enough on the technical parts while leaving aside the most academic aspects.

Introduction

Developers of mobile applications face the threat of attackers who want to extract or tamper with sensitive data and code embedded in their apps. By inspecting the application internals, attackers can achieve things like removing licensing checks, generating software clones, or altering the app to provide cheating capabilities, for example, giving players elevated privileges in online games. These attacks can have a direct negative impact on the app's business model and put its users at risk.

In the case of traditional desktop software, decades of work have produced a multitude of techniques and commercial solutions, and over the years this knowledge has gradually been transferred to the mobile world. This is particularly relevant for Android, with over 2.5 billion active users and around 3.5 million apps on Google Play alone, a platform that has become an essential part of daily life.

Research Goals and Contributions

This study has two main goals:

Provide a comprehensive description of existing software protection techniques for Android.
Study the prevalence of software protection in Android in the wild.

For the first goal, a review of existing work was conducted and the results are organized around a new taxonomy, which can serve as a common analysis framework and naming convention for analysts and researchers.

For the second goal, four research questions were defined:

What percentage of Android apps use some form of protection?
Is protection more prevalent in certain Google Play categories?
Is protection becoming more or less prevalent over time?
How does Android malware compare to regular market apps in terms of protection use?

To answer these questions, signatures for 28 Android software protection products were collected and applied to a dataset of nearly 2.5 million Android apps, spanning Google Play and 18 alternative distribution markets, old and recent malware samples, and around 1.4 million pre-installed apps.

Key Findings

Among the nearly 2.5 million apps analyzed, only 96,169 employed packers, obfuscators, or protectors. Software protection is concentrated in categories such as finance, gaming, and comics on Google Play, likely driven by stronger requirements around intellectual property and sensitive data. The analysis also reveals a steady increase in the adoption of packers, obfuscators, and protectors over time. Furthermore, software protection is significantly more prevalent in Chinese app markets, where up to 40% of applications use packers and up to 20% use obfuscators. These numbers reflect only what APKiD was able to detect; since it relies on YARA rules, tools not covered by its ruleset go undetected, so the actual prevalence of software protection across the dataset, although not validated, is likely higher than reported.

The Threat Model: Man-At-The-End Attacks

Before explaining the techniques, it is worth being precise about who developers are defending against, because the threat model is slightly different from most of what gets called security in the mobile space. In this case we are not talking about a network attacker exploiting a vulnerability remotely. Software protection for Android is about a specific adversary that the literature calls the Man-At-The-End (MATE) attacker.

A MATE attacker is an adversary with full physical or logical control over the end of the communication, in this case, the device running the app. They have the compiled application. They can run it on a device they fully control, which may be rooted, instrumented, or emulated. They have unlimited time and access to every standard analysis tool: static disassemblers and decompilers (e.g. IDA Pro, Ghidra, jadx, JEB), dynamic instrumentation frameworks (e.g. Frida, Xposed), debuggers (e.g. JDWP for Java-level debugging, or gdb and lldb for native code), and emulation environments (e.g. QEMU, Nox). They can observe the application's behavior, patch its bytecode and repackage it, and replay executions as many times as they want.

Figure 1. Representation of MATE attacks, showing the attacker's access to the app through static and dynamic analysis tools, and the three main attack goals: tampering, reverse engineering, and cloning.

This threat model was formalized by Falcarin et al. [1] and has become the standard reference frame for software protection research. It is worth understanding how powerful it is. The attacker does not need to break any cryptography or find a vulnerability. They can simply run the app, pause it at the right moment, and read a decrypted key out of memory. They can modify the bytecode to skip a license check and repackage the app. They can trace function calls and reconstruct proprietary algorithms from their inputs and outputs.

Three concrete attack goals motivate MATE attacks in practice, each addressed by different protection techniques:

Tampering is an integrity attack. The attacker modifies the app to change its behavior, removing license checks, unlocking premium features without paying, or modifying a game client to gain capabilities over other players. This requires understanding the code well enough to find the right instructions to patch, then repackaging and resigning the APK with an attacker-controlled key. The primary barrier here is the cost of analysis: if the code is sufficiently obfuscated, finding the right patch point becomes significantly more expensive.
Malicious reverse engineering is a confidentiality attack. The attacker wants to extract something the developer intended to keep secret: API keys or tokens granting access to a backend service, cryptographic keys used for data encryption, proprietary algorithms representing competitive IP, or backend endpoint URLs that should not be public. Attackers do not necessarily need to understand the whole codebase, they can target just the part containing the information they are after. In Android apps, this is a particularly common attack because developers routinely embed credentials and keys directly in the APK, often incorrectly assuming the code is not accessible.
Cloning is redistribution. The attacker removes licensing mechanisms or vendor-specific network endpoints and publishes a modified copy of the app on an alternative market, often for free or with their own monetization injected (e.g. a replacement advertising SDK). This directly harms the developer's revenue and can put users at risk if the clone contains malicious functionality.

As established by Collberg et al.'s foundational work [2], virtually all software protection techniques accept that a sufficiently motivated attacker with unlimited time and resources will eventually succeed. This is an uncomfortable truth for software security. The realistic goal is not prevention, as we have seen that is effectively impossible, but delay: making an attack expensive enough in time and effort that the attacker gives up, or slow enough that a new version of the app can be shipped before the attack completes, potentially rendering the attacker's work useless.

A Taxonomy of Android Software Protection Techniques

In the paper, we propose a taxonomy that organizes protection techniques into four major families, each grouped by protection goal. We think that this taxonomy can be used by tools or researchers in order to provide a common naming convention for anti-analysis techniques. Let's go through each one in depth.

Figure 2. Taxonomy of Android software protection techniques, organized into four families: adversarial execution environment checks, anti-disassembly and anti-decompilation, code and data obfuscation, and program loading abuse.

Family 1: Adversarial Execution Environment Checks

This family covers techniques that detect whether the app is running in an analysis environment and react accordingly, commonly refusing to run, crashing deliberately, returning incorrect results to mislead the analyst, or triggering an alert to a backend server. Analysis tools commonly leave artifacts in the execution environment: files in known locations, open ports, loaded shared libraries, system properties, unmodified device identifiers, etc.

These checks are a cat-and-mouse game. Every artifact that a framework introduces becomes a detection target. Every detection technique has a corresponding bypass. Analysts have developed standard approaches for defeating common checks, and protection developers continuously improve their signatures to detect newer tools and bypass attempts. Let's quickly review the sub-techniques.

Anti-Dynamic Binary Instrumentation

Dynamic Binary Instrumentation (DBI) is a technique that involves injecting additional code into a running process to observe or modify its behavior. It is the foundation of modern Android dynamic analysis, and the two dominant frameworks are Frida and Xposed.

How Frida works. Frida operates by injecting a JavaScript-controlled agent into the target process. A Frida client connects to a Frida server on the device and instructs it to inject the agent. The agent runs inside the target process's address space and can use Frida's JavaScript API to intercept any Java method, replace its implementation, read and write memory, call native functions, and communicate results back to the client. Hooking Java internal functions can be very useful for extracting information.

The injection mechanism leaves artifacts. Frida injects a shared library (frida-agent-*.so) into the target process, which is visible in /proc/self/maps. It opens a local socket or TCP port for communication between the agent and the client. The injected library also has recognizable byte sequences that can be scanned for in process memory. All these artifacts can be detected by the protected application to avoid being analyzed.

How Xposed works. Xposed (now typically deployed as a Magisk module) works differently: instead of injecting into individual processes at analysis time, it modifies the Android Runtime globally so that any installed Xposed module can hook any method in any application. It hooks the handleBindApplication entry point in the Zygote process, which is the parent of all Android app processes, ensuring its hooks are inherited by every forked process. Xposed leaves traces in the loaded library list of every process, its bridge library (XposedBridge.jar) appears as a loaded dex file, and the Xposed installer app can be detected by checking for its package name.

Detection approaches. Anti-DBI checks look for these artifacts through multiple channels: looking for known libraries in /proc/self/maps, checking for certain open ports, scanning process memory, etc.

Anti-Emulation

Emulators provide a rooted and instrumented software environment in which an analyst can run a target application. Emulators scale better than physical Android devices in analysis pipelines because they can be snapshotted, restored, cloned, etc. Running thousands of apps through a dynamic analysis sandbox is only practical with emulators.

How emulators are detected. Similarly to DBI detection, the idea is to discover artifacts introduced by the emulator in the system, or values that are absent or differ from those on a real device.

System properties. QEMU, which underlies the Android emulator distributed with Android Studio, sets system properties including ro.kernel.qemu=1, qemu.hw.mainkeys, ro.kernel.android.qemud, and ro.kernel.qemu.gles. These can be read via android.os.SystemProperties.get() or by parsing /system/build.prop. Other emulators have their own property sets: koplayer sets ro.ttvmd.caps.bat, ro.ttvmd.caps.cam, and ttvmd.gps.latitude. Bluestacks has its own distinguishing properties.

Known files and paths. Emulators leave files on the filesystem that are absent on real devices. QEMU creates /dev/socket/qemud. Andy (Andy is no longer a recommended emulator) creates fstab.andy. Nox creates fstab.nox. The presence of these files is a strong signal that the app is running in an emulator. /dev/socket/genyd is associated with Genymotion, another popular emulator used in security research.

Phone numbers and device identifiers. The Android emulator uses fixed phone numbers for the simulated telephony subsystem: 15555215554, 15555215556, 15555215574, and similar values. Checking the device's phone number against a list of these known values is a simple but effective check.

CPU and hardware signatures. Emulators often have CPU identifiers or hardware model strings that differ from any real device. Reading /proc/cpuinfo and checking for anomalous values is a common technique. The absence of a GPU or the presence of a software renderer can also be a signal.

Anti-Debugging

Debugging gives an analyst fine-grained control over execution: they can pause the app at any point, inspect and modify all memory, step through individual instructions, and set conditional breakpoints. It is the most powerful single tool in an analyst's dynamic analysis toolkit, which makes anti-debugging a high-priority protection technique.

Android supports two independent debugging mechanisms that require separate detection approaches.

Java-level debugging via JDWP. The Java Debug Wire Protocol allows a Java debugger (like IntelliJ IDEA or Android Studio's debugger) to connect to a running Android process and debug its Java code. Only apps marked as debuggable in their AndroidManifest.xml can be attached with a JDWP debugger by default; in production builds, this flag should be absent. The android.os.Debug.isDebuggerConnected() method returns true if a JDWP debugger is currently attached, providing the most direct detection. An additional check is Debug.waitingForDebugger(), which can detect if the app is paused waiting for a debugger to attach at startup.

A more aggressive approach modifies the internal JdwpState structure used by the ART runtime to manage the JDWP connection. By corrupting or zeroing specific fields in this structure, a protection can prevent a debugger from attaching successfully even to a debuggable build, without the app needing to crash or exit explicitly.

Native debugging via ptrace. At the Linux layer, debugging is implemented through the ptrace system call. A debugger process uses ptrace(PTRACE_ATTACH, target_pid, ...) to attach to the target and gain control over its execution. The most straightforward detection technique is reading /proc/self/status and checking the TracerPID field: a value of 0 means no debugger is attached; any other value means a tracer is present.

A more proactive technique exploits the exclusivity of the ptrace relationship: only one process can ptrace another at a time. If the app calls ptrace(PTRACE_TRACEME, 0, 0, 0) on itself at startup, this registers the app as willingly being traced and prevents any external debugger from attaching subsequently, since a second ptrace attach would fail. Apps often do this very early in the initialization sequence.

A third approach has the app spawn a child process that ptraces the parent. Since the parent is now being traced by the child, any external debugger attach attempt fails. The child process acts as a watchdog: if any strange behavior is detected, it can kill the parent process.

Timing-based detection. Debugging slows execution. If the time between two points in the code is significantly longer than expected (e.g. the analyst is tracing execution step by step), a timing check can detect this. System.nanoTime() or native clock_gettime() calls can measure elapsed time with nanosecond precision. The challenge is calibrating the expected time correctly across different device speeds; too tight a threshold produces false positives on slow devices.

Root Checks

A rooted device gives an analyst capabilities that are unavailable on a stock Android device: the ability to read and write files in the app's private data directory (/data/data/<package>), to dump and modify process memory without needing a debugger, to remount system partitions as writable and replace system libraries, and to run arbitrary code as the system user. Most serious Android analysis workflows involve a rooted device.

Root detection is therefore an important barrier to raise the cost of analysis. The core techniques are:

su binary detection. The most common rooting method involves installing the su binary in a location where apps can find and execute it. Protection code checks for the su binary in a predefined list of paths: /system/bin/su, /system/xbin/su, /sbin/su, /data/local/xbin/su, /data/local/bin/su, and so on. A more robust approach attempts to actually execute su and checks the return code.

Magisk detection. Magisk is the dominant modern rooting framework and uses a "systemless" approach that modifies the boot image rather than the system partition. However, Magisk still leaves artifacts: the Magisk Manager app (detectable by package name), Magisk-specific files in /data, Magisk's socket interface, and modifications to the device's SELinux policy that differ from stock.

Build property checks. Rooting often modifies system properties. ro.secure=0 indicates that the root shell is enabled. ro.debuggable=1 indicates a debug build. android.os.Build.TAGS contains release-keys on production devices and test-keys or dev-keys on custom ROMs and rooted builds.

Anti-Bot

Automated bots and testing frameworks are used in large-scale app analysis to drive UI interactions. The Android Debug Bridge (ADB) provides the input command for injecting synthetic input events. The Android monkey tool generates pseudo-random UI events. UI Automator provides a higher-level API for scripted UI interaction. From a protection perspective, these bots are a concern because they can be used in automated analysis pipelines, or abused for content scraping, fake ad clicks, credential stuffing, or generating fraudulent accounts.

Android provides an API to check for the monkey tool: the method ActivityManager.isUserAMonkey(), which returns true if the current input is being generated by the monkey tool. Analysis of app interaction patterns can also be performed to detect real human interaction. Finally, a common method used across many websites that can also be applied here is the use of CAPTCHAs.

Anti-Tampering

The bytecode of Android applications can be modified using tools like apktool. Anti-tampering techniques try to detect that this has happened and respond accordingly.

Hash-based integrity checking. The most direct approach computes a cryptographic hash over some portion of the application binary and compares it against an expected value. The problem is that Java code cannot read its own bytecode the way native code can; solutions to this involve techniques like DEX loading, which we will cover later.
Signature verification. When an attacker repackages an APK they must re-sign it with their own certificate. The app can call PackageManager.getPackageInfo() with the GET_SIGNATURES flag and compare the signing certificate against a hardcoded expected value. This is straightforward to bypass once you know it is there, but it raises the bar against naive repackaging tools.
Installer verification. PackageManager.getInstallerPackageName(packageName) returns the package name of the app that installed this app. On Google Play this is com.android.vending. When an analyst sideloads an APK via ADB, the installer package name is typically null. Checking for an expected source is a lightweight integrity signal.
ART-level bytecode verification. The most sophisticated anti-tampering approaches hook the Android Runtime to intercept each method invocation, hash the bytecode of the method about to execute, and compare it against an expected value before allowing execution to proceed. This technique is not widely used, and only a few papers have analyzed it.

Family 2: Anti-Disassembly and Anti-Decompilation

Static analysis tools are commonly the first line of attack for most reverse engineers. Before running the app, an analyst will try to retrieve information from a disassembler or a decompiler from the Java code (using tools like jadx), or from the internal libraries (using tools like IDA Pro or Ghidra). Anti-disassembly and anti-decompilation techniques target these tools directly.

Anti-Disassembly

Disassembly converts raw bytes into a human-readable representation of machine instructions. Mostly two algorithms are used: linear sweep (process bytes sequentially from start to end) and recursive disassembly (start from known entry points and follow control flow transitions). Each has known weaknesses.

Linear sweep is vulnerable to inserted junk bytes that look like valid instructions but are never executed. A single byte inserted between two real instructions that happens to be a valid opcode with a multi-byte encoding will cause the linear sweep disassembler to incorrectly decode everything that follows until the next re-synchronization point. Since the byte is never executed, the functional behavior is unchanged.

Recursive disassembly follows control flow, making it immune to unreachable bytes. But it can be confused by indirect jumps whose targets cannot be determined statically, by self-modifying code, or by jump instructions with constant conditions; a conditional jump that always branches is disassembled as a conditional jump (producing a false-path basic block) even though it functions as an unconditional jump.

In the Android DEX format, anti-disassembly is harder to apply than in native code because the DEX format explicitly encodes method boundaries and instruction lengths, and the code must be correctly verified by the virtual machine before execution. Anti-disassembly is primarily a concern for the native libraries (.so files) that are increasingly common in Android apps.

Anti-Decompilation

Decompilation lifts an instruction-level representation into high-level pseudo-code using control flow analysis, data flow analysis, and pattern matching. Anti-decompilation techniques introduce code patterns that violate the assumptions these heuristics make (e.g., irreducible control flow, hand-crafted instruction sequences that no compiler would ever generate, or bytecode that is technically valid but defeats specific decompiler implementations).

At the DEX level, specifically crafted bytecode can cause jadx to crash or produce Java code that does not accurately represent the actual behavior. Since Android requires bytecode to pass the DEX verifier before execution, anti-decompilation at the DEX level must produce bytecode that is valid enough to pass verification while still defeating specific decompilers.

Family 3: Code and Data Obfuscation

Code obfuscation transforms a program into one that is functionally equivalent but significantly harder to understand. Unlike the environment checks described above, obfuscation is passive: it operates on the binary itself and raises the cost of analysis regardless of what tools the analyst uses.

Control-Flow Flattening

One of the most powerful structural obfuscation techniques. To understand it, consider what a normal method's control flow graph (CFG) looks like: a series of basic blocks connected by conditional and unconditional branches, forming a graph whose shape reflects the logical structure of the code.

Control-flow flattening, introduced by Wang et al., destroys this structure. The transformation extracts all basic blocks from the method and places them as cases in a single large switch statement. A state variable determines which block executes next. At the end of each block, the code updates the state variable and jumps back to the switch dispatcher. The result is a completely flat CFG: every block connects back to the same dispatcher, and the dispatcher connects to every block.

From a static analysis perspective, the CFG goes from a structured graph that reflects the algorithm's logic to a nearly complete graph where any block can potentially follow any other.

Some implementations make the state variable computation itself opaque (the next state is computed through an expression that is difficult to evaluate statically). Although the CFG and its edges remain unchanged at runtime, the complexity of the transitions between basic blocks increases, making it difficult for a static analyzer (e.g., a disassembler) to resolve which transitions are actually reachable.

Opaque Predicates

A conditional expression that appears to have a non-constant output from a static analysis perspective, but always evaluates to the same value at runtime. The simplest examples use mathematical identities. Expressions like (x * (x + 1)) % 2 == 0 are always true because the product of any integer and its successor is always even, but a static analyzer that does not know the value of x cannot determine this without reasoning about number-theoretic properties. A conditional branch on this expression always takes the true branch, but the false branch (containing junk or misleading code) inflates the CFG and must be analyzed by any tool attempting to understand the code.

Mixed-Boolean Arithmetic

Mixed-Boolean Arithmetic (MBA) is a well-known obfuscation technique that replaces arithmetic and boolean expressions with semantically equivalent but syntactically complex expressions that mix bitwise operations with arithmetic operations.

A simple expression like x + y using MBA protection can be replaced by (x ^ y) + 2 * (x & y), a well-known identity from computer arithmetic. The resulting expression is significantly harder to simplify automatically, particularly for SMT solvers. The power of MBA comes from the combination of bitwise and arithmetic operations: purely arithmetic expressions can be simplified using algebraic reasoning, purely boolean expressions can be handled by boolean solvers, but the mixing of the two domains defeats solvers designed for one or the other.

MBA deobfuscation has been an active research area for nearly a decade. Early foundational work includes the PhD thesis by Eyrolles [3], which provided reconstruction and simplification tools for MBA expressions. Another key milestone was the synthesis-based deobfuscation approach by David et al. [16], which inspired subsequent work including approaches based on program synthesis like the one by Blazytko et al. [4]. Other approaches include neural networks [5] and mathematical transformations [6]. More recent work has pushed the state of the art further, with tools such as SiMBA [7], GAMBA [8], and CoBRA [9] improving simplification coverage and performance.

Code Virtualization

One of the most widespread obfuscation technique today. In standard compilation, source code is compiled to bytecode or machine code for a real instruction set. A disassembler understands the instruction set and can decode the binary. A decompiler understands common patterns in compiled output and can lift it back toward the source.

Code virtualization eliminates this by defining a completely custom instruction set, an architecture that exists only for this specific protected build. The protected method's logic is translated into bytecode for this custom ISA (called v-code). At runtime, an interpreter embedded in the application reads and executes that v-code.

From a static analysis perspective, what you see when you disassemble a virtualized method is the interpreter. The actual logic of the method is completely absent from any standard instruction set. To understand what the method does, an analyst must reverse-engineer the interpreter to understand the custom ISA, extract the v-code for the specific method, and disassemble and analyze it using the recovered ISA. This is a substantial investment of time and effort, particularly because the custom ISA changes between protection products and often between protected builds.

White-Box Cryptography

Cryptographic operations often need to be executed without revealing confidential information. White-box cryptography aims to achieve this by providing obfuscated algorithms, where operations on the secret key are combined with random data and code.

Symbol Renaming

Java and Android bytecode retain the names of all classes, methods, fields, and local variables in the binary. A class named CreditCardProcessor with a method validateLuhnChecksum is already significantly understood without examining any of its code. Symbol renaming replaces all of these with short, meaningless identifiers (a, b, aa, ab, etc.).

Android made ProGuard part of the Android build toolchain, and its successor R8 is the default code shrinker, which also performs symbol renaming as part of the release build process. These modifications are purely cosmetic; nothing in the code logic changes.

Data and Resource Obfuscation

Data obfuscation targets the constant values and resources embedded in the app. The most important form is string encryption: strings are encrypted at compile time and a decryption routine recovers the original string at runtime just before use. The binary contains only ciphertext, plus the decryption code. URLs and API endpoints, error messages, hardcoded credentials, and API keys are all invisible to simple string searches.

Resource obfuscation applies similar ideas to non-code assets: layout files, images, and raw data files can be stored encrypted and decrypted at runtime. Opaque constant expressions replace literal values with expressions that are difficult to evaluate statically, hiding constants from pattern-matching tools.

Family 4: Program Loading Abuse

This family covers techniques that hide code by exploiting Android's dynamic class loading to defer the presence of the real application code until runtime, avoiding static analysis.

DEX Loading

The Android API provides DexClassLoader and BaseDexClassLoader for loading DEX files, JAR files, or APK files at runtime. A packer uses this mechanism as follows: the APK contains the loader code (a DexClassLoader class), while the actual application logic is included as an encrypted blob in the resources or assets. At runtime, the loader decrypts the payload and loads it using DexClassLoader methods. Static analysis tools can only see the loader's code. Other observed methods use BaseDexClassLoader and DexPathList.

Multi-DEX Abuse

The DEX format has a limit of 65,535 method references per file. Android's multi-DEX mechanism addresses this by allowing apps to split code across multiple DEX files. Multi-DEX abuse stores additional DEX files as encrypted resources or assets rather than as proper DEX files that static tools would parse. The primary DEX decrypts and loads them at runtime. The FluBot banking malware family is a notable real-world example of this technique.

ART Hooking

ART hooking is the most sophisticated loading technique and operates at a level that defeats some dynamic analysis approaches. The Android Runtime itself is hooked so that when the runtime calls internal functions to load or invoke a method, the protection's code runs first. This makes it possible to intercept method dispatch and, when a protected method is about to be called, inject the actual bytecode into the method's in-memory representation just in time for execution.

Android Code Protectors

In our ACM paper, we survey a total of 28 protection tools that APKiD [15] can detect, grouped as 16 packers, 7 obfuscators, and 5 protectors. APKiD's detection is based on YARA rules manually written by analysts.

Prevalence of the Techniques

We identified the presence of six different protection techniques that are highly common across the 28 solutions detected by APKiD, as shown in Table 1.

Table 1. Common Techniques Found in the 28 Android Software Protectors Studied in This Work.

From the table we can see that code obfuscation is highly prevalent, present in 23 of the 28 tools. Anti-debugging, anti-emulation, and anti-DBI are common across all three categories. DEX loading is strongly correlated with packers and almost entirely absent from obfuscators.

The 28 tools come from a range of actors. Chinese security companies dominate the packer space: Jiagu (360 Security), Tencent, Ijiami, Bangcle, Baidu, and Qihoo 360's packer are all widely deployed in Chinese markets, provided as SDKs or online services where a developer uploads an APK and receives a protected version back. Cross-platform open-source tools like O-LLVM appear across both legitimate and malicious apps. O-LLVM is a fork of the LLVM compiler infrastructure that adds obfuscation passes at the IR level before native code generation. Commercial RASP products from Western security vendors like DexGuard (Guardsquare), DexProtector, Arxan, PromonShield (Promon), WhiteCryption, Appdome, Virbox, and Vkey are the tools of choice for financial and enterprise applications, combining multiple protection techniques into comprehensive SDKs with support contracts and compliance documentation.

Android Software Protection in the Wild

In this section, we try to answer the following questions:

RQ1. How prevalent are different protection techniques in Android applications?
RQ2. Is protection more commonly found in certain categories from Google Play?
RQ3. What protections are typically used in Android malware?
RQ4. How has the use of Android software protection evolved over time?

For this analysis, we used a dataset of nearly 2.5 million Android applications, including market apps, pre-installed apps, and malware. The market apps include applications from the Google Play Store and 18 alternative markets. The pre-installed apps come from a dataset of 1,452,762 Android applications crowdsourced from 40k users across 184 countries. Finally, malicious applications were collected from the Argus collection [11] obtained from VX-Underground, as well as older malware datasets such as Malgenome [13], PRAGuard [12], and VirusShare.

All applications were run through a pipeline with APKiD and combined with additional metadata about each application and its source.

Table 2. Datasets of Market, Preinstalled, and Malicious Apps Used in This Work

Global Prevalence of Software Protection

We successfully analyzed 2,450,338 APKs (99% of the dataset). The results show that 96,169 apps (~4%) used at least one packer, obfuscator, or protector for which there exists a rule in APKiD, broken down into: 50,664 apps with packers, 45,320 with obfuscators, and 185 with protectors (with overlap, since apps can use multiple tools).

Table 3. Prevalence of Anti-Analysis Techniques and Protection Software Detected in Different App Sources

The distribution across markets is the most significant pattern in the data. The split between Chinese markets and everything else is striking: Huawei AppGallery sits at 43% packed and 21% obfuscated, Qihoo 360 at 40% packed and 22% obfuscated, Baidu at 25% packed and an extraordinary 57% obfuscated. Compare that to Google Play at 0.59% packed and 2.65% obfuscated, an order of magnitude lower on every measure. The reasons behind this disparity are not completely clear. Chinese developers may face stronger pressures around IP protection within a more mature ecosystem of protection tooling. However, since APKiD relies on YARA rules for detection, anti-analysis techniques not covered by its ruleset may go undetected, potentially lowering the results. Whether the gap reflects a genuine difference in protection practices, a detection bias, or both, remains an open question.

F-Droid, the open-source Android app repository, shows 0% on both counts: open-source apps have no IP to protect through obfuscation, and their developers tend not to use commercial protection tools. Pre-installed apps, the 1.4 million APKs that ship with device firmware, are the least protected of all: 0.03% packed and 0.25% obfuscated, despite many handling sensitive device functionality.

Distribution of Anti-Analysis Software

Figure 3. Analysis of detected packers, obfuscators, and protectors across the full dataset. Note that the x-axis is in logarithmic scale.

Previous figure illustrates the prevalence of different anti-analysis software across the three main categories. Among packers, Jiagu (27,730 apps) accounts for more detected packed apps than the next three packers combined. Among obfuscators, O-LLVM (21,696 apps) and Arxan (15,046 apps) together account for the large majority of detections. Protectors are detected in very low numbers, with WhiteCryption being the most used protector at just 116 apps. The dominance of Chinese tools in the packer rankings is consistent with the market distribution: most packed apps come from Chinese markets, and Chinese markets predominantly use Chinese packer services.

Finance and Games: The Protected Categories

Table 4. Packer, Obfuscator, and Protector by Category from Google Play Store

Within Google Play, Table 4 shows that protection is not uniformly low and is concentrated in specific categories. Games show an obfuscation rate of 5.71% and Finance 7.07%, both significantly above the Google Play average, driven by obvious economic incentives: preventing game client modification for cheating and piracy, and meeting the security requirements around payment processing and mobile banking.

Figure 4. Analysis of detected anti-analysis software in the Finance category from Google Play Store. Note that the x-axis is in logarithmic scale.

In the Finance category, more interesting than the rates are the specific tools used. Figure 4 shows that the dominant packer is PromonShield rather than Jiagu, and the dominant obfuscator is DexGuard rather than O-LLVM. This reflects a deliberate procurement decision: financial institutions buy enterprise-grade commercial RASP products with compliance documentation and vendor support, rather than reaching for free Chinese packer services or open-source obfuscators.

Protection in Android Malware

Figure 5. Distribution of detected packers, obfuscators, and protectors across all malware datasets. Note that the x-axis is in logarithmic scale.

Previous figure shows the use of different anti-analysis software in malware. The distribution of packers and obfuscators in malware is broadly similar to the legitimate app distribution: Jiagu, Tencent, Bangcle, and Ijiami account for 81% of packed malware, the same set of Chinese packers that dominates in Chinese markets generally. O-LLVM, DexGuard, and Arxan account for 97% of obfuscated malware.

The most telling difference is what is absent: PromonShield and Kony, which are highly prevalent in Finance apps, are completely absent from malware, because they are sold to enterprises with licensing and are not available to malware authors. DexGuard appears much more in Finance apps than in malware, while O-LLVM goes the other way. The commercial/open-source divide in tooling access draws a visible line in the data.

The old Malgenome dataset (circa 2009–2011) shows essentially no protection. More recent samples sit around 10% for packing. This may be due to the use of in-house protections or more aggressive protections that hide the patterns detected by APKiD.

Longitudinal Analysis of Software Protection

Figure 6. Usage of packers, obfuscators, and protectors grouped by targetSdkVersion across the full dataset.

Figure 7. Usage of packers, obfuscators, and protectors grouped by targetSdkVersion in malware apps.

The longitudinal analysis in Figures 6 and 7 tracks protection adoption by targetSdkVersion as a proxy for app age and finds a consistent upward trend over the past decade. Two factors drive this growth: R8/ProGuard integration into Android Studio has made basic symbol renaming and dead code elimination essentially automatic in release builds, raising the baseline level of obfuscation in new apps; and the growth of alternative markets and piracy, particularly in Chinese markets, has created stronger incentives for developers to invest in more substantial protection.

The pattern differs between malware and Finance apps. In malware, packers are the most prevalent form of protection across all API levels, with obfuscators following. In Finance apps, obfuscators show a stronger and more consistent upward trend, reflecting the shift toward commercial RASP products as the category has matured. In both cases, protectors remain rare throughout.

Practical Implications

For Android reverse engineers and security researchers, the main practical takeaway is that most apps analyzed will not have meaningful protection beyond R8 symbol renaming. Systematic resistance such as DEX loading, control-flow flattening, or anti-debug checks is concentrated in Chinese market apps, Google Play Finance apps, and Games. With the data analyzed in the paper, it is possible to form a reasonable expectation of what kind of protection an application may have based on its source and category.

For developers building apps that handle sensitive data, intellectual property, or payment flows, the paper makes a case that MATE attacks deserve more attention in threat modeling. A developer who embeds an API key in the APK without any protection is relying on security through obscurity, and in the case of Android bytecode, this obscurity can easily be bypassed through open-source decompilation tools. More substantial protection should be used to protect sensitive information, and this should be part of any secure programming guide for Android.

For the security research community, the measurement methodology itself is a contribution. Signature-based detection via APKiD is fast enough to scale to millions of apps but has unknown and likely significant false-negative rates, particularly for custom or modified protectors. Dynamic analysis would improve recall but does not scale. Better tooling for large-scale dynamic analysis of protected apps remains an open problem, as does automated deobfuscation of the techniques described in this post.

Conclusion

Android software protection is a technically rich area. The techniques described in this post, from basic symbol renaming to multi-layered obfuscation or full code virtualization, and from emulator detection to ART hooking, represent a sophisticated arms race between protection developers and the analyst community.

The central empirical finding of the paper is that despite this sophistication, most of the Android ecosystem remains unprotected. Despite a likely slight underestimation, the overall 4% protection rate, concentrated heavily in Chinese markets and a few app categories, suggests that awareness and adoption of software protection techniques among Android developers lags significantly behind the actual threat. The tools are available, the techniques are well-understood, and the threat is real. The gap is in the threat modeling: MATE attacks are not part of how most Android developers think about their app's security surface.

The companion website for the paper includes annotated code examples for every technique in the taxonomy: android-obfuscation.github.io. The dataset is publicly available via Zenodo.

As stated at the beginning, this post is based on the last research I published during my PhD together with my supervisor Dr. Juan Tapiador, next you can find the paper and where it was published.

Blazquez, E. and Tapiador, J. (2025). Practical Android Software Protection in the Wild. ACM Computing Surveys, 58(2), Article 36.

Acknowledgments

I would like to thank everyone in QShield, the team where I have been able to improve my knowledge in the area of software protection, writing my first obfuscations for Java bytecode. In the team I have been able to learn everything from complex topics like properly managing the Java memory model when obfuscating a Java method, to simpler things like properly working with git. Thanks to Jean-François, who once again gave me the opportunity to write a blog post for Quarkslab's blog. Back in 2018, when I first discovered Quarkslab, I never imagined I would end up writing a blog post for them too. Thank you very much!

References

[1] Paolo Falcarin, Christian Collberg, Mikhail Atallah, and Mariusz Jakubowski. 2011. Guest editors' introduction: Software protection. IEEE Software 28 (03 2011), 24–27. DOI: 10.1109/MS.2011.34

[2] Christian Collberg, Clark Thomborson, and Douglas Low. 1997. A taxonomy of obfuscating transformations. Retrieved from http://www.cs.auckland.ac.nz/staff-cgi-bin/mjd/csTRcgi.pl?serial

[3] Ninon Eyrolles. 2017. Obfuscation with Mixed Boolean-Arithmetic Expressions: reconstruction, analysis and simplification tools. Ph.D. Thesis, Université Paris-Saclay. Retrieved from https://tel.archives-ouvertes.fr/tel-01623849

[4] Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. 2017. Syntia: Synthesizing the Semantics of Obfuscated Code. In Proceedings of the 26th USENIX Security Symposium (SEC'17), 643–659. USENIX Association.

[5] Weijie Feng, Binbin Liu, Dongpeng Xu, Qilong Zheng, and Yun Xu. 2020. NeuReduce: Reducing Mixed Boolean-Arithmetic Expressions by Recurrent Neural Network. In Findings of the Association for Computational Linguistics: EMNLP 2020, 635–644. DOI: 10.18653/v1/2020.findings-emnlp.56

[6] Binbin Liu, Junfu Shen, Jiang Ming, Qilong Zheng, Jing Li, and Dongpeng Xu. 2021. MBA-Blast: Unveiling and Simplifying Mixed Boolean-Arithmetic Obfuscation. In 30th USENIX Security Symposium (USENIX Security 21), 1701–1718. USENIX Association. Retrieved from https://www.usenix.org/conference/usenixsecurity21/presentation/liu-binbin

[7] Benjamin Reichenwallner and Peter Meerwald-Stadler. 2022. Efficient Deobfuscation of Linear Mixed Boolean-Arithmetic Expressions. In Proceedings of the 2022 ACM Workshop on Robust Malware Analysis (Checkmate '22), 19–28. ACM. DOI: 10.1145/3560831.3564256

[8] Benjamin Reichenwallner and Peter Meerwald-Stadler. 2023. Simplification of General Mixed Boolean-Arithmetic Expressions: GAMBA. In Proceedings of the 2nd Workshop on Robust Malware Analysis (WORMA'23), 427–438. IEEE. DOI: 10.1109/EuroSPW59978.2023.00053

[9] Kyle Elliott. 2026. Simplifying MBA obfuscation with CoBRA. Trail of Bits Blog. Retrieved from https://blog.trailofbits.com/2026/04/03/simplifying-mba-obfuscation-with-cobra/

[11] Argus Collection. (n.d.). Retrieved March 29, 2023 from https://vx-underground.org/Samples/Argus%20Collection

[12] Davide Maiorca, Davide Ariu, Igino Corona, Marco Aresu, and Giorgio Giacinto. 2015. Stealth attacks: An extended insight into the obfuscation effects on android malware. Computers and Security 51, C (Jun 2015), 16–31. DOI: 10.1016/j.cose.2015.02.007

[13] Yajin Zhou and Xuxian Jiang. 2012. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, 95–109. DOI: 10.1109/SP.2012.16

[15] Android Application Identifier for Packers, Protectors, Obfuscators and Oddities - PEiD for Android. (n.d.). Retrieved May 15, 2023 from https://github.com/rednaga/APKiD

[16] Robin David, Luigi Coniglio, and Mariano Ceccato. 2020. QSynth - A Program Synthesis based approach for Binary Code Deobfuscation. In Proceedings of the Workshop on Binary Analysis Research (BAR'20). NDSS. DOI: 10.14722/bar.2020.23009

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

文章来源: http://blog.quarkslab.com/practical-android-software-protection-in-the-wild-an-appetizer.html
如有侵权请联系:admin#unsafe.sh