CodeQL Python 类型追踪

2025-06-10

在分析CodeQL Python 污点分析时候发现，有用到shared 的新模块typetracking。从前面Python 的Extractor 分析中，我们知道它的DB 内是没有类型信息的，没有类型信息是怎么做污点呢？

Java 是在编译的时候创建的DB，这一过程中可以拿到了AST 节点对应的类型信息，例如call 节点函数信息，可以直接关联callable，知道调的是哪个函数。但是纯AST 要怎么做？

联想之前做Java 反射Patch 的时候，印象中也需要用到类型跟踪,但是是自己实现的，改动颇大，不知道这个typetracking 是否能够更优雅的解决，分析下。

upload successful

1. 抽象模块

在shared/typetracking/internal/TypeTracking.qll 中，定义了抽象模块

CallGraphConstruction::Simple::InputSig
CallGraphConstruction::Simple::Make

InputSig

InputSig 可以简单理解为dataflow 中的Config，它是一个signature module，类似接口类的概念。

/** The input to call graph construction. */
signature module InputSig {
  /** A state to track during type tracking. */
  class State;

  /** Holds if type tracking should start at `start` in state `state`. */
  predicate start(Node start, State state);

  /** Holds if type tracking should stop at `n`. */
  predicate filter(Node n);
}

其定义了两个接口

start：起始节点，需要根据不同的场景，统一转换成Node 类型，比如：
- classInstance 类型的，class 为ClassInstance，需要
filter：类似sanitizer，过滤方法

2. Python 内置Tracers

实际上，有很多内置Tracker，以Python 为例，其内置了以下实现

classTracker
classInstanceTracker
selfTracker
clsArgumentTracker
superCallNoArgumentTracker
superCallTwoArgumentTracker

2.1 ClassTracer

2.1.1 TrackClassInput

start

private module TrackClassInput implements CallGraphConstruction::Simple::InputSig {
  class State = Class;

  predicate start(Node start, Class cls) {
    start.asExpr() = cls.getParent()
    or
    // when a class is decorated, it's the result of the (last) decorator call that
    // is used
    start.asExpr() = cls.getParent().getADecoratorCall()
    or
    // `type(obj)`, where obj is an instance of this class
    start = getTypeCall() and
    start.(CallCfgNode).getArg(0) = classInstanceTracker(cls)
  }
}

释义：

普通的类class，那么取其Expr 对应的Node 作为start
1
Class Foo: // 整个作为 start Expr
当一个类class 存在修饰符的时候，那么选择其最后一个装饰器作为start
1
2
3
@decorator2
@decorator1 // decorator1 作为 start Expr
Class Foo:
如果有个表达式type(obj) ，其中obj 是class 类型的，那么整个type(obj) 作为start
1
2
3
4
Class Foo:
...
foo = Foo()
type(foo) // type(foo) 作为start Expr
这里用到了classInstanceTracker，后面有详细解释

filter

predicate filter(Node n) {
  ignoreForCallGraph(n.getLocation().getFile())
  or
  n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}

2.1.2 Make API

1
2
3

Node classTracker(Class cls) {
  CallGraphConstruction::Simple::Make<TrackClassInput>::track(cls).(LocalSourceNode).flowsTo(result)
}

释义：

2.2 classInstanceTracker

以类实例化作为起点
跟踪寻找其传播可能

2.2.1 TrackClassInstanceInput

start

private module TrackClassInstanceInput implements CallGraphConstruction::Simple::InputSig {
  class State = Class;

  predicate start(Node start, Class cls) {
    resolveClassCall(start.(CallCfgNode).asCfgNode(), cls)
    or
    // result of `super().__new__` as used in a `__new__` method implementation
    exists(Class classUsedInSuper |
      fromSuperNewCall(start.(CallCfgNode).asCfgNode(), classUsedInSuper, _, _) and
      classUsedInSuper = getADirectSuperclass*(cls)
    )
  }
}

释义：

resolveClassCall

1
2
3

class Foo:
	...
bar = Foo()	// Foo() 作为start

getADirectSuperclass 是获取cls 的父类

1
2
3

class Foo:
	def __new__():
  bar = super().__new__(cls)	// super().__new__(cls) 作为start

resolveClassCall

// -------------------------------------
// class call resolution
// -------------------------------------
/**
 * Holds when `call` is a call to the class `cls`.
 *
 * NOTE: We have this predicate mostly to be able to compare with old point-to
 * call-graph resolution. So it could be removed in the future.
 */
predicate resolveClassCall(CallNode call, Class cls) {
  call.getFunction() = classTracker(cls).asCfgNode()
  or
  // `cls()` inside a classmethod (which also contains `type(self)()` inside a method)
  exists(Class classWithMethod |
    call.getFunction() = clsArgumentTracker(classWithMethod).asCfgNode() and
    getADirectSuperclass*(cls) = classWithMethod
  )
}

注意：

CallNode.getFunction 并不是字面上的意思，并不是找到call 对应的Function，而只是找到call，举例解释:

CallNode: foo.bar(arg)

getFunction: foo.bar

/** A control flow node corresponding to a call expression, such as `func(...)` */
class CallNode extends ControlFlowNode {
  CallNode() { toAst(this) instanceof Call }

  /** Gets the flow node corresponding to the function expression for the call corresponding to this flow node */
  ControlFlowNode getFunction() {
    exists(Call c |
      this.getNode() = c and
      c.getFunc() = result.getNode() and
      result.getBasicBlock().dominates(this.getBasicBlock())
    )
  }

filter

predicate filter(Node n) {
  ignoreForCallGraph(n.getLocation().getFile())
  or
  n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}

2.2.2 Make API

/**
 * Gets a reference to an instance of the class `cls`.
 */
Node classInstanceTracker(Class cls) {
  CallGraphConstruction::Simple::Make<TrackClassInstanceInput>::track(cls)
      .(LocalSourceNode)
      .flowsTo(result)
}

2.3 SelfTracker

2.3.1 TrackSelfInput

start

private module TrackSelfInput implements CallGraphConstruction::Simple::InputSig {
  class State = Class;

  predicate start(Node start, Class classWithMethod) {
    exists(Function func |
      func = classWithMethod.getAMethod() and
      not isStaticmethod(func) and
      not isClassmethod(func)
    |
      start.asExpr() = func.getArg(0)
    )
  }
}

释义：

存在Function，是当前class 的方法，并且不是静态方法和内置classmethod 方法，那么该方法的第一个数self 就是start
1
2
class Foo:
def bar(self): // self 参数作为 start expr

filter

predicate filter(Node n) {
  ignoreForCallGraph(n.getLocation().getFile())
  or
  n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}

2.3.2 Make API

/**
 * Gets a reference to the `self` argument of a method on class `classWithMethod`.
 * The method cannot be a `staticmethod` or `classmethod`.
 */
Node selfTracker(Class classWithMethod) {
  CallGraphConstruction::Simple::Make<TrackSelfInput>::track(classWithMethod)
      .(LocalSourceNode)
      .flowsTo(result)
}

2.4 ClassArgumentTracker

以类中方法第一个参数作为起始点
跟踪寻找其传播可能

2.4.1 TrackClsArgumentInput

start

private module TrackClsArgumentInput implements CallGraphConstruction::Simple::InputSig {
  class State = Class;

  predicate start(Node start, Class classWithMethod) {
    exists(Function func |
      func = classWithMethod.getAMethod() and
      isClassmethod(func)
    |
      start.asExpr() = func.getArg(0)
    )
    or
    // type(self)
    start = getTypeCall() and
    start.(CallCfgNode).getArg(0) = selfTracker(classWithMethod)
  }
}

释义：

存在Function func，属于class 的方法，那么start 节点就是这个方法的第一个形参

ps: python 这api 命名规范。。。形参实参不分。。。吐槽
1
2
3
Class Foo():
@classmethod
def func(cls): // start = cls

type(self) 场景

1
2
3

Class Foo():
	def bar(self):
    	type(self) // start = type(self)

filter

predicate filter(Node n) {
  ignoreForCallGraph(n.getLocation().getFile())
  or
  n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}

2.4.2 Make API

/**
 * Gets a reference to the enclosing class `classWithMethod` from within one of its
 * methods, either through the `cls` argument from a `classmethod` or from `type(self)`
 * from a normal method.
 */
Node clsArgumentTracker(Class classWithMethod) {
  CallGraphConstruction::Simple::Make<TrackClsArgumentInput>::track(classWithMethod)
      .(LocalSourceNode)
      .flowsTo(result)
}

3. 自定义Trackers

3.1 AttrReadTracker

在分析resolveCall 方法中，其功能是定位Call 和其对应的Function 定义关系，遇到了AttrReadTracker，调用逻辑如下：

resolveCall(CallNode call, Function target, CallType type)
  resolveMethodCall
    directCall
      findFunctionAccordingToMroKnownStartingClass
      directCall_join
      	attrReadTracker
	      	CallGraphConstruction::Simple::Make<TrackAttrReadInput>

详细分析下其实现

顾名思义，是针对属性读的跟踪，分两步

构造CallGraphConstruction::Simple::InputSig 的实现类TrackAttrReadInput
- 重写start/filter
make AttrReadTracker，从CallGraphConstruction::Simple::Make<TrackAttrReadInput>
调用track 接口

3.1.1 TrackAttrReadInput

start

// =============================================================================
// attribute trackers
// =============================================================================
private module TrackAttrReadInput implements CallGraphConstruction::Simple::InputSig {
  class State = AttrRead;

  predicate start(Node start, AttrRead attr) {
    start = attr and
    pragma[only_bind_into](attr.getObject()) in [
        classTracker(_), classInstanceTracker(_), selfTracker(_), clsArgumentTracker(_),
        superCallNoArgumentTracker(_), superCallTwoArgumentTracker(_, _)
      ]
  }

  predicate filter(Node n) {
    ignoreForCallGraph(n.getLocation().getFile())
    or
    n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
  }
}

释义：

起点是AttrRead，attr 是能够被跟踪定位对应class 的

1
2
3

class Foo:
	bar = 'test' 
foo.bar		// foo.bar 是AttrRead，attr.getObject 中的attr，也就是foo 是能够被跟踪的

从结果来讲

filter

/** Gets a reference to the attribute read `attr` */
Node attrReadTracker(AttrRead attr) {
  CallGraphConstruction::Simple::Make<TrackAttrReadInput>::track(attr)
      .(LocalSourceNode)
      .flowsTo(result)
}

Make API

/** Gets a reference to the attribute read `attr` */
Node attrReadTracker(AttrRead attr) {
  CallGraphConstruction::Simple::Make<TrackAttrReadInput>::track(attr)
      .(LocalSourceNode)
      .flowsTo(result)
}