通过 SIL、IR 理解 swift的闭包

“要理解 Swift 中的函数和闭包,你需要切实弄明白三件事情,我们把这三件事按照重要程度进行了大致排序:

  1. 函数可以像 Int 或者 String 那样被赋值给变量,也可以作为另一个函数的输入参数,或者另一个函数的返回值来使用。
  2. 函数能够捕获存在于其局部作用域之外的变量。
  3. 有两种方法可以创建函数,一种是使用 func 关键字,另一种是 { }。在 Swift 中,后一种被称为闭包表达式。” 摘录来自 Swift 进阶 Chris Eidhof https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewBook?id=0

swift 中的闭包

  • 闭包像函数一样包含方法体、参数、返回值。
  • 闭包可以捕获外部的变量,延长变量的生命周期。
  • 闭包可以赋值给变量、作为参数传递、从函数返回。

闭包如何捕获值

使用

swiftc -emit-sil main.swift > ./main_SIL.swift

将如下代码编译成 SIL

class TestClosure {
    func test() {
        var a: Int = 10
        let closure: (Int)->(Int) = { i in
            return 0
        }
        closure(8)
    }
}

得到

// TestClosure.test()
sil hidden @$s4main11TestClosureC4testyyF : $@convention(method) (@guaranteed TestClosure) -> () {
// %0 "self"                                      // user: %1
bb0(%0 : $TestClosure):
  debug_value %0 : $TestClosure, let, name "self", argno 1, implicit // id: %1
  %2 = alloc_stack [lexical] $Int, var, name "a"  // users: %5, %10
  %3 = integer_literal $Builtin.Int64, 10         // user: %4
  %4 = struct $Int (%3 : $Builtin.Int64)          // user: %5
  store %4 to %2 : $*Int                          // id: %5
  // function_ref closure #1 in TestClosure.test()
  %6 = function_ref @$s4main11TestClosureC4testyyFSiycfU_ : $@convention(thin) () -> Int // user: %7
  %7 = thin_to_thick_function %6 : $@convention(thin) () -> Int to $@callee_guaranteed () -> Int // users: %9, %8
  debug_value %7 : $@callee_guaranteed () -> Int, let, name "closure" // id: %8
  %9 = apply %7() : $@callee_guaranteed () -> Int
  dealloc_stack %2 : $*Int                        // id: %10
  %11 = tuple ()                                  // user: %12
  return %11 : $()                                // id: %12
} // end sil function '$s4main11TestClosureC4testyyF'

函数类型

来自 https://github.com/apple/swift/blob/main/docs/SIL.rst#owned

@convention(thin) indicates a “thin” function reference, which uses the Swift calling convention with no special “self” or “context” parameters. @convention(thick) indicates a “thick” function reference, which uses the Swift calling convention and carries a reference-counted context object used to represent captures or other state required by the function. This attribute is implied by @callee_owned or @callee_guaranteed. If it is @callee_guaranteed, the context value is treated as a direct parameter. This implies @convention(thick). If the function type is also @noescape, then the context value is unowned, otherwise it is guaranteed. If it is @callee_owned, the context value is treated as an owned direct parameter. This implies @convention(thick) and is mutually exclusive with @noescape. Owned

  • thin function 是没有引用”self”或其他上下文的函数
  • thick function 是携带了一个引用计数的上下文对象,用于表示捕获值和其他状态的函数。
  • callee_owned 与 callee_guaranteed 都表示 function 是 thick,它们的区别在于 ownership 的 不同

三种 ownership 的解释

来自 https://stackoverflow.com/questions/39839746/what-does-the-guaranteed-attribute-in-swift-do

  • unowned – Neither the caller or callee assert ownership of the passed value, but it is guaranteed to be valid at the time of the call (unless the callee does something to invalidate this).
  • owned – The callee has ownership of the value. The caller will retain the value before passing it, and then it’s the responsibility of the callee to release it once done with it.
  • guaranteed – The caller asserts ownership of the value, allowing the callee to have the guarantee that it will be valid at the time of call.

owned/unowned 表示调用者与被调用者对传入值的所有权,unowned指双方都不拥有传入值的所有权,owned则表示被调用者拥有传入值的所有权,调用完成后由被调用者来释放它。 guaranteed 是调用者断言拥有值的所有权,保证这个值在调用期间有效。

还有一个问题

稍后在 IR 部分详细看。

接下来我们在闭包内加上一行对外部变量的引用。

class TestClosure {
    func test() {
        var a: Int = 10
        let closure: (Int)->(Int) = { i in
            a += 1
            return 0
        }
        closure(8)
    }
}

重新编译成 SIL

// TestClosure.test()
sil hidden @$s4main11TestClosureC4testyyF : $@convention(method) (@guaranteed TestClosure) -> () {
// %0 "self"                                      // user: %1
bb0(%0 : $TestClosure):
  debug_value %0 : $TestClosure, let, name "self", argno 1, implicit // id: %1
  %2 = alloc_box ${ var Int }, var, name "a"      // users: %17, %9, %8, %3
  %3 = project_box %2 : ${ var Int }, 0           // user: %6
  %4 = integer_literal $Builtin.Int64, 10         // user: %5
  %5 = struct $Int (%4 : $Builtin.Int64)          // user: %6
  store %5 to %3 : $*Int                          // id: %6
  // function_ref closure #1 in TestClosure.test()
  %7 = function_ref @$s4main11TestClosureC4testyyFS2icfU_ : $@convention(thin) (Int, @guaranteed { var Int }) -> Int // user: %9
  strong_retain %2 : ${ var Int }                 // id: %8
  %9 = partial_apply [callee_guaranteed] %7(%2) : $@convention(thin) (Int, @guaranteed { var Int }) -> Int // users: %16, %15, %14, %11, %10
  debug_value %9 : $@callee_guaranteed (Int) -> Int, let, name "closure" // id: %10
  strong_retain %9 : $@callee_guaranteed (Int) -> Int // id: %11
  %12 = integer_literal $Builtin.Int64, 8         // user: %13
  %13 = struct $Int (%12 : $Builtin.Int64)        // user: %14
  %14 = apply %9(%13) : $@callee_guaranteed (Int) -> Int
  strong_release %9 : $@callee_guaranteed (Int) -> Int // id: %15
  strong_release %9 : $@callee_guaranteed (Int) -> Int // id: %16
  strong_release %2 : ${ var Int }                // id: %17
  %18 = tuple ()                                  // user: %19
  return %18 : $()                                // id: %19
} // end sil function '$s4main11TestClosureC4testyyF'

变化:

  • a 由原本的栈上初始化变为了堆上初始化。
  • 创建闭包前,a 进行了一个strong_retain

project_box

sil-instruction ::= 'project_box' sil-operand

%1 = project_box %0 : $@box T

// %1 has type $*T

Given a @box T reference, produces the address of the value inside the box.

project_box用于传递引用类型对象内的值的地址。

https://github.com/apple/swift/blob/main/docs/SIL.rst#partial-apply

partial_apply

sil-instruction ::= 'partial_apply' callee-ownership-attr? on-stack-attr? sil-value
                      sil-apply-substitution-list?
                      '(' (sil-value (',' sil-value)*)? ')'
                      ':' sil-type
callee-ownership-attr ::= '[callee_guaranteed]'
on-stack-attr ::= '[on_stack]'

%c = partial_apply %0(%1, %2, ...) : $(Z..., A, B, ...) -> R
// Note that the type of the callee '%0' is specified *after* the arguments
// %0 must be of a concrete function type $(Z..., A, B, ...) -> R
// %1, %2, etc. must be of the argument types $A, $B, etc.,
//   of the tail part of the argument tuple of %0
// %c will be of the partially-applied thick function type (Z...) -> R

%c = partial_apply %0<A, B>(%1, %2, ...) : $(Z..., T, U, ...) -> R
// %0 must be of a polymorphic function type $<T, U>(T, U, ...) -> R
// %1, %2, etc. must be of the argument types after substitution $A, $B, etc.
//   of the tail part of the argument tuple of %0
// %r will be of the substituted thick function type $(Z'...) -> R'

Creates a closure by partially applying the function %0 to a partial sequence of its arguments. This instruction is used to implement closures.

简单讲就是,闭包通过部分应用函数的方法创建,实际上就是预先将捕获值作为函数参数初始化。

IR

https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics

先了解一下 IR 语法中的数据类型

//整形
iN
//结构体
%T = type {<type list>}
//数组
[<elementnumber> x <elementtype>]
//指针
<type>*

指令

//通过偏移量取结构体或是数组中的成员
<result> = getelementptr <ty>, <ty>* <ptrval>{, [inrange] <ty> <id x>}*

<result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*

//类型转换
<result> = bitcast <ty> <value> to <ty2>             ; yields ty2

使用

swiftc -emit-ir main.swift > ./main_IR.swift

将文件转换为 IR 代码

define hidden swiftcc void @"$s4main11TestClosureC4testyyF"(%T4main11TestClosureC* swiftself %0) #0 {
entry:
  %self.debug = alloca %T4main11TestClosureC*, align 8
  %1 = bitcast %T4main11TestClosureC** %self.debug to i8*
  call void @llvm.memset.p0i8.i64(i8* align 8 %1, i8 0, i64 8, i1 false)
  %a.debug = alloca %TSi*, align 8
  %2 = bitcast %TSi** %a.debug to i8*
  call void @llvm.memset.p0i8.i64(i8* align 8 %2, i8 0, i64 8, i1 false)
  %closure.debug = alloca %swift.function, align 8
  %3 = bitcast %swift.function* %closure.debug to i8*
  call void @llvm.memset.p0i8.i64(i8* align 8 %3, i8 0, i64 16, i1 false)
  store %T4main11TestClosureC* %0, %T4main11TestClosureC** %self.debug, align 8
  %4 = call noalias %swift.refcounted* @swift_allocObject(%swift.type* getelementptr inbounds (%swift.full_boxmetadata, %swift.full_boxmetadata* @metadata, i32 0, i32 2), i64 24, i64 7) #2
  %5 = bitcast %swift.refcounted* %4 to <{ %swift.refcounted, [8 x i8] }>*
  %6 = getelementptr inbounds <{ %swift.refcounted, [8 x i8] }>, <{ %swift.refcounted, [8 x i8] }>* %5, i32 0, i32 1
  %7 = bitcast [8 x i8]* %6 to %TSi*
  store %TSi* %7, %TSi** %a.debug, align 8
  %._value = getelementptr inbounds %TSi, %TSi* %7, i32 0, i32 0
  store i64 10, i64* %._value, align 8
  %8 = call %swift.refcounted* @swift_retain(%swift.refcounted* returned %4) #2
  %9 = bitcast %swift.function* %closure.debug to i8*
  call void @llvm.lifetime.start.p0i8(i64 16, i8* %9)
  %closure.debug.fn = getelementptr inbounds %swift.function, %swift.function* %closure.debug, i32 0, i32 0
  store i8* bitcast (i64 (i64, %swift.refcounted*)* @"$s4main11TestClosureC4testyyFS2icfU_TA" to i8*), i8** %closure.debug.fn, align 8
  %closure.debug.data = getelementptr inbounds %swift.function, %swift.function* %closure.debug, i32 0, i32 1
  store %swift.refcounted* %4, %swift.refcounted** %closure.debug.data, align 8
  %10 = call %swift.refcounted* @swift_retain(%swift.refcounted* returned %4) #2
  %11 = call swiftcc i64 @"$s4main11TestClosureC4testyyFS2icfU_TA"(i64 8, %swift.refcounted* swiftself %4)
  call void @swift_release(%swift.refcounted* %4) #2
  call void @swift_release(%swift.refcounted* %4) #2
  call void @swift_release(%swift.refcounted* %4) #2
  ret void
}
  • %x.debug 是debuger用于获取调试信息的变量,实际指向的内存与%x相同。
  • i8* 通常指代 void* (https://zhuanlan.zhihu.com/p/103674744)
  • getelementptr 取结构体成员时,第一个索引总是i32 0,指跨过结构体本身(https://llvm.org/docs/GetElementPtr.html#why-is-the-extra-0-index-required)

逐句分析:

%self.debug = alloca %T4main11TestClosureC*, align 8
  %1 = bitcast %T4main11TestClosureC** %self.debug to i8*
  call void @llvm.memset.p0i8.i64(i8* align 8 %1, i8 0, i64 8, i1 false)

栈上初始化 self 相当于

var self: TestClosure
%a.debug = alloca %TSi*, align 8
  %2 = bitcast %TSi** %a.debug to i8*
  call void @llvm.memset.p0i8.i64(i8* align 8 %2, i8 0, i64 8, i1 false)

在栈上初始化 a,并将值置为 0 相当于

var a: Int

以此类推

%closure.debug = alloca %swift.function, align 8
  %3 = bitcast %swift.function* %closure.debug to i8*
  call void @llvm.memset.p0i8.i64(i8* align 8 %3, i8 0, i64 16, i1 false)

相当于

var closure: (Int)->(Int)

最后为 self 赋值

store %T4main11TestClosureC* %0, %T4main11TestClosureC** %self.debug, align 8

至此完成了 test 方法的初始化。

%4 = call noalias %swift.refcounted* @swift_allocObject(%swift.type* getelementptr inbounds (%swift.full_boxmetadata, %swift.full_boxmetadata* @metadata, i32 0, i32 2), i64 24, i64 7) #2
  %5 = bitcast %swift.refcounted* %4 to <{ %swift.refcounted, [8 x i8] }>*
  %6 = getelementptr inbounds <{ %swift.refcounted, [8 x i8] }>, <{ %swift.refcounted, [8 x i8] }>* %5, i32 0, i32 1
  %7 = bitcast [8 x i8]* %6 to %TSi*
  store %TSi* %7, %TSi** %a.debug, align 8
  %._value = getelementptr inbounds %TSi, %TSi* %7, i32 0, i32 0
  store i64 10, i64* %._value, align 8
  %8 = call %swift.refcounted* @swift_retain(%swift.refcounted* returned %4) #2

call 代表函数调用,此处调用的是

swift_allocObject

返回值是一个 %swift.refcounted* 类型,即引用类型结构体。

然后将结构体类型转换为了一个

<{ %swift.refcounted, [8 x i8] }>

实际上就是 Int 类型的结构体

然后从结构体中取出了第一个元素,即 Int 值的地址。

为其赋予 10,并 retain。

因此上述流程相当于

a = 10

继续看闭包的初值流程

  %9 = bitcast %swift.function* %closure.debug to i8*
  call void @llvm.lifetime.start.p0i8(i64 16, i8* %9)
  %closure.debug.fn = getelementptr inbounds %swift.function, %swift.function* %closure.debug, i32 0, i32 0
  store i8* bitcast (i64 (i64, %swift.refcounted*)* @"$s4main11TestClosureC4testyyFS2icfU_TA" to i8*), i8** %closure.debug.fn, align 8
  %closure.debug.data = getelementptr inbounds %swift.function, %swift.function* %closure.debug, i32 0, i32 1
  store %swift.refcounted* %4, %swift.refcounted** %closure.debug.data, align 8
  %10 = call %swift.refcounted* @swift_retain(%swift.refcounted* returned %4) #2
  %11 = call swiftcc i64 @"$s4main11TestClosureC4testyyFS2icfU_TA"(i64 8, %swift.refcounted* swiftself %4)

此处初始化与 a 不同的地方是显式地使用了

call void @llvm.lifetime.start.p0i8(i64 16, i8* %9)

表示闭包生命周期的开始。

由于 closure 需要在函数内部被传递和存活较长时间,如果不手动标记生命周期,编译器很难自动推断出准确的生命周期,所以需要开发者使用 @llvm.lifetime 显式地标记生命周期,避免发生变量被回收后继续使用的错误。

此处对闭包的两个成员进行了赋值,

  • fn: 闭包的函数部分,对应SIL中的convention(thin)
  • data: 闭包的捕获值,此处 data 是存储了 a 结构体的地址

最后简单看一下闭包函数的定义

define internal swiftcc i64 @"$s4main11TestClosureC4testyyFS2icfU_TA"(i64 %0, %swift.refcounted* swiftself %1) #0 {
entry:
  %2 = tail call swiftcc i64 @"$s4main11TestClosureC4testyyFS2icfU_"(i64 %0, %swift.refcounted* %1)
  ret i64 %2
}

可以看到闭包调用的函数s4main11TestClosureC4testyyFS2icfU_TA 内部尾调用了另一个函数s4main11TestClosureC4testyyFS2icfU_ 而这个函数才是我们闭包内部真正的函数。

那么为什么还需要进行这么一层包装呢。

闭包有一个重要的特性捕获列表,我们将它加上

class TestClosure {
    func test() {
        var a: Int = 10
        let closure: (Int)->(Int) = {[a] i in
            print(a)
            return 0
        }
        a += 1
        closure(8)
    }
}

再次编译 IR。

这次我们发现创建闭包的过程与原本并没有什么不同,但是跳转到闭包函数的部分。

define internal swiftcc i64 @"$s4main11TestClosureC4testyyFS2icfU_TA"(i64 %0, %swift.refcounted* swiftself %1) #0 {
entry:
  %2 = bitcast %swift.refcounted* %1 to <{ %swift.refcounted, %TSi }>*
  %3 = getelementptr inbounds <{ %swift.refcounted, %TSi }>, <{ %swift.refcounted, %TSi }>* %2, i32 0, i32 1
  %._value = getelementptr inbounds %TSi, %TSi* %3, i32 0, i32 0
  %4 = load i64, i64* %._value, align 8
  %5 = tail call swiftcc i64 @"$s4main11TestClosureC4testyyFS2icfU_"(i64 %0, i64 %4)
  ret i64 %5
}

发现在尾调用之前,闭包对捕获对象进行了值的拷贝。

这同时也验证了 SIL 中 convention(thin) 和 convention(thick) 的 设计

先thin后thick实现了功能分离,
thin函数关注自身实现,
thick函数关注上下文处理。

至此我们确定了三件事情

  • 闭包是引用类型。
  • 闭包捕获的值是引用类型,受引用计数管理。
  • 闭包的本质是对函数进行了上下文包装。

种一棵树最好的时间是在十年前,而后是现在。

Loading Disqus comments...
Table of Contents