使用Doop识别Apache Commons Text漏洞的污点信息流

一、Doop静态分析框架简介

Doop是由希腊雅典大学Plast-Lab团队Yannis Smaragdakis开发的开源静态程序分析框架，具有以下特点：

架构组成：
- 使用Groovy编写的调用程序作为"粘合剂"
- 包含fact-generator和datalog分析器两大核心组件
- 通过调用这些组件实现自动化分析
分析能力：
- 支持Java源码和字节码分析（推荐使用字节码）
- 提供从粗糙到精细的多层次分析策略：
  - Context-insensitive
  - 1-call-site-sensitive
  - 1-call-site-sensitive+heap
- 通过addons扩展支持：
  - 信息流分析
  - Spring等框架
  - Java反射特性
工作流程：
- fact generator解析输入（如jar包）
- 使用soot、wala等工具生成jimple IR
- 生成分析引擎所需的facts文件
- 使用Soufflé引擎基于facts和datalog规则分析

二、Apache Commons Text RCE漏洞(CVE-2022-42889)简介

漏洞背景：
- Apache Commons Text是处理字符串和文本块的开源项目
- 漏洞原理类似Log4j2，但影响范围较小
漏洞原理：
- 存在可造成代码执行的插值器（如ScriptStringLookup）
- 未对输入字符串进行充分安全验证

漏洞调用栈：

StringSubstitutor.replace()
→ ScriptStringLookup.lookup()
→ ScriptEngine.eval()

POC验证：
- 使用commons-text 1.9版本
- 通过字符串替换函数触发ScriptStringLookup的lookup方法

三、使用Doop识别污点信息流

1. 分析目标设定

Source：org.apache.commons.text.StringSubstitutor.replace()
Sink：ScriptEngine.eval()

2. Doop配置

命令参数：

-a context-insensitive \
--app-only \
--information-flow spring \
--fact-gen-cores 4 \
-i commons-text.jar \
--platform java_8 \
--stats none

App-only模式特点：

将!ApplicationMethod(?signature)加入isOpaqueMethod(?signature)
不分析JDK类，提高效率
在精度、速度和可用性间取得平衡

3. 自定义注解与重新编译

定义注解：
- TestctxTaintedClassAnnotation
- TestctxTaintedParamAnnotation

注解使用：

@TestctxTaintedClassAnnotation
public class StringSubstitutor {
    @TestctxTaintedParamAnnotation
    public String replace(String str) { ... }
}

重新编译打包：
- 下载commons text源码
- 添加自定义注解
- 重新编译为jar包

4. Doop规则改造

4.1 基础规则介绍

Doop使用Soufflé分析datalog规则，主要概念：

关系(relation)：定义数据间的逻辑关系

规则示例：

.decl equivalence(a: symbol, b: symbol)
equivalence(a, b) :- rel1(a), rel2(b).

4.2 关键规则修改

添加污点转移函数：

BaseToRetTaintTransferMethod("<java.lang.String: java.lang.String[] split(java.lang.String,int)>").

识别自定义注解：

EntryPointClass(?type) :-
   Type_Annotation(?type, "org.apache.commons.text.TestctxTaintedClassAnnotation").

MockObject(?mockObj, ?type) :-
   Type_Annotation(?type, "org.apache.commons.text.TestctxTaintedClassAnnotation").

参数污点注解识别：

mainAnalysis.VarPointsTo(?hctx, cat(cat(cat(cat(?to, "::: "), ?type), "::: "), "ASSIGN"), ?ctx, ?to) :-
  FormalParam(?idx, ?meth, ?to),
  (Param_Annotation(?meth, ?idx, "org.springframework.web.bind.annotation.RequestParam");
  Param_Annotation(?meth, ?idx, "org.springframework.web.bind.annotation.RequestBody");
  Param_Annotation(?meth, ?idx, "org.apache.commons.text.TestctxTaintedParamAnnotation");

确保方法可达性：

ImplicitReachable("<org.apache.commons.text.StringSubstitutor: java.lang.String replace(char[])>") :- 
  isMethod("<org.apache.commons.text.StringSubstitutor: java.lang.String replace(char[])>").

定义Sink：

LeakingSinkMethodArg("default", 0, method) :- 
  isMethod(method), 
  match("<javax.script.ScriptEngine: java.lang.Object eval[(].*[)]>", method).

4.3 污点转移分析

定义基础关系：

.decl OptTaintedtransMethodInvocationBase(?invocation:MethodInvocation,?method:Method,?ctx:configuration.Context,?base:Var)

OptTaintedtransMethodInvocationBase(?invocation,?tomethod,?ctx,?base) :-
  ReachableContext(?ctx, ?inmethod),
  Instruction_Method(?invocation, ?inmethod),
  (
    _VirtualMethodInvocation(?invocation, _, ?tomethod, ?base, _);
    _SpecialMethodInvocation(?invocation, _, ?tomethod, ?base, _)
  ).

返回值类型分析：

.decl MaytaintedInvocationInfo(?invocation:MethodInvocation,?type:Type,?ret:Var)

MaytaintedInvocationInfo(?invocation, ?type, ?ret) :-
  Method_ReturnType(?method, ?type),
  MethodInvocation_Method(?invocation, ?method),
  AssignReturnValue(?invocation, ?ret).

.decl MaytaintedTypeForReturnValue(?type:Type, ?ret:Var, ?invocation:MethodInvocation)

MaytaintedTypeForReturnValue(?type, ?ret, ?invocation) :-
  MaytaintedInvocationInfo(?invocation, ?type, ?ret),
  !VarIsCast(?ret).

污点变量转移规则：

VarIsTaintedFromVar(?type, ?ctx, ?ret, ?ctx, ?base) :-
  mainAnalysis.OptTaintedtransMethodInvocationBase(?invocation,?method,?ctx,?base),
  MaytaintedTypeForReturnValue(?type, ?ret, ?invocation),
  BaseToRetTaintTransferMethod(?method).

Sink变量识别：

LeakingSinkVariable(?label, ?invocation, ?ctx, ?var) :-
  LeakingSinkMethodArg(?label, ?index, ?tomethod),
  mainAnalysis.OptTaintedtransMethodInvocationBase(?invocation,?tomethod,?ctx,?base),
  ActualParam(?index, ?invocation, ?var).

5. 分析结果解读

关键结果文件：
- LeakingTaintedInformation.csv：污点源到sink的完整流
- AppTaintedVar.csv：被污染的变量信息
- CallGraphEdge.csv：调用图关系（辅助分析）

示例结果：

default default <<immutable-context>> 
<org.apache.commons.text.lookup.ScriptStringLookup: java.lang.String lookup(java.lang.String)>/javax.script.ScriptEngine.eval/0 
<org.apache.commons.text.StringSubstitutor: java.lang.String replace(java.lang.String)>/@parameter0

污染变量示例：
- resolveVariable方法中的：
  - 入参variableName
  - buf
  - resolver
  - SSA优化生成的栈变量如$stack7

四、总结与评估

Doop优势：
- 算法先进且丰富
- 开源规则可高度定制
- 与CodeQL设计思路相似（程序分析转为数据查询）
实际应用挑战：
- 分析大型项目需要较多计算资源
- 规则定制和结果查看不够直观
- 更适合作为研究框架而非直接用于生产扫描
扩展可能性：
- 可结合更多程序分析论文算法
- 可包装为更易用的漏洞扫描工具
- 可扩展支持更多框架和语言特性
改进方向：
- 优化结果可视化
- 增强对现代Java特性的支持
- 提高大规模分析的效率

使用Doop识别Apache Commons Text漏洞的污点信息流一、Doop静态分析框架简介 Doop是由希腊雅典大学Plast-Lab团队Yannis Smaragdakis开发的开源静态程序分析框架，具有以下特点：架构组成：使用Groovy编写的调用程序作为"粘合剂" 包含fact-generator和datalog分析器两大核心组件通过调用这些组件实现自动化分析分析能力：支持Java源码和字节码分析（推荐使用字节码）提供从粗糙到精细的多层次分析策略： Context-insensitive 1-call-site-sensitive 1-call-site-sensitive+heap 通过addons扩展支持：信息流分析 Spring等框架 Java反射特性工作流程： fact generator解析输入（如jar包）使用soot、wala等工具生成jimple IR 生成分析引擎所需的facts文件使用Soufflé引擎基于facts和datalog规则分析二、Apache Commons Text RCE漏洞(CVE-2022-42889)简介漏洞背景： Apache Commons Text是处理字符串和文本块的开源项目漏洞原理类似Log4j2，但影响范围较小漏洞原理：存在可造成代码执行的插值器（如ScriptStringLookup）未对输入字符串进行充分安全验证漏洞调用栈： POC验证：使用commons-text 1.9版本通过字符串替换函数触发ScriptStringLookup的lookup方法三、使用Doop识别污点信息流 1. 分析目标设定 Source ： org.apache.commons.text.StringSubstitutor.replace() Sink ： ScriptEngine.eval() 2. Doop配置命令参数： App-only模式特点：将 !ApplicationMethod(?signature) 加入 isOpaqueMethod(?signature) 不分析JDK类，提高效率在精度、速度和可用性间取得平衡 3. 自定义注解与重新编译定义注解： TestctxTaintedClassAnnotation TestctxTaintedParamAnnotation 注解使用：重新编译打包：下载commons text源码添加自定义注解重新编译为jar包 4. Doop规则改造 4.1 基础规则介绍 Doop使用Soufflé分析datalog规则，主要概念：关系(relation)：定义数据间的逻辑关系规则示例： 4.2 关键规则修改添加污点转移函数：识别自定义注解：参数污点注解识别：确保方法可达性：定义Sink ： 4.3 污点转移分析定义基础关系：返回值类型分析：污点变量转移规则： Sink变量识别： 5. 分析结果解读关键结果文件： LeakingTaintedInformation.csv ：污点源到sink的完整流 AppTaintedVar.csv ：被污染的变量信息 CallGraphEdge.csv ：调用图关系（辅助分析）示例结果：污染变量示例： resolveVariable 方法中的：入参 variableName buf resolver SSA优化生成的栈变量如 $stack7 四、总结与评估 Doop优势：算法先进且丰富开源规则可高度定制与CodeQL设计思路相似（程序分析转为数据查询）实际应用挑战：分析大型项目需要较多计算资源规则定制和结果查看不够直观更适合作为研究框架而非直接用于生产扫描扩展可能性：可结合更多程序分析论文算法可包装为更易用的漏洞扫描工具可扩展支持更多框架和语言特性改进方向：优化结果可视化增强对现代Java特性的支持提高大规模分析的效率