iOS直播画中画方案调研

画中画简介

画中画是苹果在 iOS 9 和 iOS 14 中分别对 iPad 和 iPhone 开放系统级悬浮窗能力，打开画中画后，直播和视频可以无缝地在系统全局播放。

使用限制

AVPictureInPictureController 最初仅支持一种初始化方式：

init?(playerLayer: AVPlayerLayer)

即 AVPLayer，而在大多数情况下由于种种原因我们并不会使用原生的播放器，例如之前提到的 AVPlayer 仅支持 HLS 流直播，因此要使用画中画我们需要通过一些曲线救国的方法。

方案一：RTMP -> HLS

通过服务端将 RTMP 转流为 HLS，输送到一个后台的 AVPlayer，同时旁路 RTMP 流输送给第三方播放器播放，只是要注意保持两个播放器的时间片同步。

优点：兼容性强

这种方案是通过兼容 AVPlayer 达到目的，因此所有支持画中画的设备都可以兼容。

缺点

1. 额外的性能开销

无论转流服务建立在 Nginx 服务器上还是建立在移动端设备上，都不可避免的造成额外开销。

2. 更高的延迟

之前提到过 HLS 延迟源自于起播缓冲以及建立连接等，该种方案的延迟相当于将 RTMP 与 HLS 的延迟合并起来，造成比原本更为严重的延迟。

3. 额外的适配成本

在面对不同协议的流时需要各自将他们都转成 HLS 流。

方案二：AVSampleBufferDisplayLayer

在 iOS 15 中，苹果为画中画推出了一种新的初始化方法：

init(contentSource: AVPictureInPictureController.ContentSource)

该方式允许通过一个自定义的视图进行画中画的初始化，包括使用AVSampleBufferDisplayLayer。

下面简单介绍一下AVSampleBufferDisplayLayer。

AVSampleBufferDisplayLayer 是 CALayer 的一个子类，它可以用来渲染 CMSampleBuffer。

CMSampleBuffer & CVPixelBuffer

简而言之，CVPixelBuffer 就是视频编码前/解码后的数据， CMSampleBuffer 则是视频编码后/解码前的数据。

那么我们只要能将要播放的画面的 CVPixelBuffer 取到，就可以将其播放在画中画上了。

优点：

无额外延时

由于是直接利用播放器解码得到的 CVPixelBuffer ，因此理论上除了对它的封装之外没有其他的延迟因素，这个延迟也可以忽略不计。

无需额外适配

由于是利用解码后的内容，因此解码前的视频是何种格式我们并不关心，只需要播放器支持回传 CVPixelBuffer 就行。

更容易获得第三方支持

很多时候我们是使用第三方的播放器或推流服务，而 CVPixelBuffer 只是单纯的画面数据，不会暴露任何其他服务商信息，因此更容易获得服务商的支持。

缺点：

由于是 iOS 15 之后才支持的 API，覆盖度会比第一种方案稍小，但随着时间过去，这个缺点会逐渐变小直至消失。

综上，由于方案二更具前瞻性和可行性，因此本次调研主要围绕方案二进行。接下来我会以 ijkplayer 为例详细介绍如何实现。

以 ijkplayer 为例的实现

从 ijkplayer 中获取 CVPixelBuffer

ijkplayer 是业界广泛使用的开源播放器，但目前的公开版本中并不会回传 CVPixelBuffer，因此需要对其做一些改造。

这个部分的内容参照了其他大佬的实现：

IJKPlayer获取实时数据（上）- 添加外部接口 IJKPlayer获取实时数据（中）- 添加软解码输出纹理 IJKPlayer获取实时数据（下）- 添加硬解码输出纹理

需要注意的一点是，按照这位大佬的实现获取的 CVPixelBuffer 是无法在屏幕上显示的，究其原因是因为他在生成 CVPixelBuffer 时没有为其添加 IOSurface 的关键字导致 buffer 没有与 IOSurface 产生关联。

详见:

AVSampleBufferDisplayLayer not rendering frames anymore in iOS10

因此对于「添加软解码输出纹理」中 ff_ffplay.c 的

 createCVPixelBuffer(FFPlayer *ffp, AVCodecContext* avctx, AVFrame* frame, CVPixelBufferRef* cvImage)

这一步，我们需要将

status = CVPixelBufferCreate(
                                     kCFAllocatorDefault,
                                     frame->width,
                                     frame->height,
                                     kCVPixelFormatType_32BGRA,
                                     NULL,
                                     cvImage
                                     );

修改成

        CFDictionaryRef pixelBufferAttributes;

        CFTypeRef emptyDict = CFDictionaryCreate(NULL, NULL, NULL, 0,
                                                 NULL, NULL);

        pixelBufferAttributes = CFDictionaryCreate(NULL,
            (const void**)&kCVPixelBufferIOSurfacePropertiesKey,
            (const void**)&emptyDict,
            1,
            NULL,
            NULL);
        
        status = CVPixelBufferCreate(
                                     kCFAllocatorDefault,
                                     frame->width,
                                     frame->height,
                                     kCVPixelFormatType_32BGRA,
                                     pixelBufferAttributes,
                                     cvImage
                                     );

将 Pixcelbuffer 封装为 CMSampleBuffer

- (CMSampleBufferRef)sampleBufferFromPixelBuffer:(CVPixelBufferRef)pixelBuffer {
  
    CMSampleBufferRef sampleBuffer = NULL;
    OSStatus err = noErr;
    CMVideoFormatDescriptionRef formatDesc = NULL;
    
    CVPixelBufferRetain(pixelBuffer);
    
    err = CMVideoFormatDescriptionCreateForImageBuffer(kCFAllocatorDefault, pixelBuffer, &formatDesc);
  
    if (err != noErr) {
        return nil;
    }
    
    CMSampleTimingInfo sampleTimingInfo = kCMTimingInfoInvalid;
  
    err = CMSampleBufferCreateReadyWithImageBuffer(kCFAllocatorDefault, pixelBuffer, formatDesc, &sampleTimingInfo, &sampleBuffer);
  
    if (sampleBuffer) {
        CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES);
        CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0);
        CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue);
    }
  
    if (err != noErr) {
        return nil;
    }
  
    formatDesc = nil;
    CVPixelBufferRelease(pixelBuffer);
  
    return sampleBuffer;
  
}

流程图

Demo 地址

可获取 PixelBuffer 的 ijkplayer

https://github.com/Lavanille777/ijkplayer

PIPDemo

https://github.com/Lavanille777/PIPDemo

知乎教育的支持情况

ZVVideo (KSMediaPlayer)

ZVVideoPlayer 提供了一个 ZVPlayerRenderProtocol 类型的代理

@protocol ZVPlayerRenderProtocol <NSObject>
@optional
- (void)displayPixelbuffer:(CVPixelBufferRef _Nullable )pixelBuffer;
@end

看上去似乎是提供了 CVPixelBufferRef 的回调，但实际上没有调用。

追查到 KSMediaPlayer，并没有发现任何视频纹理相关的回调。

声网

由于 CCLive 同样依赖了声网RTC，当前使用的 AgoraRTCKit 版本是 3.7.2。

它提供了 VideoFrame 回调。

但仅包含yuv分量的数据

__attribute__((visibility("default"))) @interface AgoraVideoDataFrame : NSObject
/** The color video format. See AgoraVideoFrameType.
 */
@property(assign, nonatomic) AgoraVideoFrameType frameType;
/** The width (px) of the video.
 */
@property(assign, nonatomic) NSInteger width;  // width of video frame
/** The height (px) of the video.
 */
@property(assign, nonatomic) NSInteger height;  // height of video frame
/** For YUV data, the line span of the Y buffer; for RGBA data, the total
 data length.
 */
@property(assign, nonatomic) NSInteger yStride;  // stride of Y data buffer
/** For YUV data, the line span of the U buffer; for RGBA data, the value is 0.
 */
@property(assign, nonatomic) NSInteger uStride;  // stride of U data buffer
/** For YUV data, the line span of the V buffer; for RGBA data, the value is 0.
 */
@property(assign, nonatomic) NSInteger vStride;  // stride of V data buffer
/** For YUV data, the pointer to the Y buffer; for RGBA data, the data buffer.
 */
@property(assign, nonatomic) void* _Nullable yBuffer;  // Y data buffer
/** For YUV data, the pointer to the U buffer; for RGBA data, the value is 0.
 */
@property(assign, nonatomic) void* _Nullable uBuffer;  // U data buffer
/** For YUV data, the pointer to the V buffer; for RGBA data, the value is 0.
 */
@property(assign, nonatomic) void* _Nullable vBuffer;  // V data buffer
/** The clockwise rotation angle of the video frame.
 See AgoraVideoRotation.
 */
@property(assign, nonatomic) AgoraVideoRotation rotation;  // rotation of this frame (0, 90, 180, 270)
/** The Unix timestamp (ms) when the video frame is rendered. This timestamp
 can be used to guide the rendering of the video frame. This parameter is
 required.
 */
@property(assign, nonatomic) int64_t renderTimeMs;
/** Reserved parameter.
 */
@property(assign, nonatomic) NSInteger avsync_type;

@end

实际测试得到的是YUV 420的格式的数据

尝试了将 YUV 合成为 CVPixelBuffer ，但画面色彩出现了问题。

func pixelBuffer(fromYUV yBuffer: UnsafeMutableRawPointer,
                     uBuffer: UnsafeMutableRawPointer,
                     vBuffer: UnsafeMutableRawPointer,
                     width: Int,
                     height: Int,
                     videoFrame: AgoraVideoDataFrame) -> CVPixelBuffer? {
        
        let pixelAttributes: [CFString : Any] = [
            kCVPixelBufferIOSurfacePropertiesKey: [:]
        ]
        
        var pixelBuffer: CVPixelBuffer?
        
        let result = CVPixelBufferCreate(kCFAllocatorDefault,
                                         width,
                                         height,
                                         kCVPixelFormatType_420YpCbCr8BiPlanarFullRange,
                                         pixelAttributes as CFDictionary,
                                         &pixelBuffer)
        
        guard result == kCVReturnSuccess, let pixelBuffer = pixelBuffer else {
            return nil
        }
        
        CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
        defer {
            CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
        }
        
        let uPlaneSize = width * height / 4
        let vPlaneSize = width * height / 4
        let numberOfElementsForChroma = uPlaneSize + vPlaneSize
        
        let uvPlane = UnsafeMutableRawPointer.allocate(byteCount: Int(numberOfElementsForChroma), alignment: MemoryLayout<UInt8>.alignment)
        defer {
          uvPlane.deallocate()
        }

        memcpy(uvPlane, uBuffer, uPlaneSize)
        memcpy(uvPlane.advanced(by: Int(uPlaneSize)), vBuffer, vPlaneSize)

        let yDestPlane = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
        let uvDestPlane = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1)
        
       
        
        var yPerRowOfPlane = Int(ceil(Double(width) / 64) * 64)
        
        if videoFrame.yStride == yPerRowOfPlane {
            memcpy(yDestPlane!, yBuffer, videoFrame.yStride * height)
            memcpy(uvDestPlane, uBuffer, videoFrame.uStride * height / 2)
            memcpy(uvDestPlane!.advanced(by: Int(videoFrame.uStride * height / 2)), vBuffer, videoFrame.vStride * height / 2)
        } else {
            for i in 0 ..< videoFrame.height {
                memcpy(yDestPlane!.advanced(by: i * yPerRowOfPlane), yBuffer + i * videoFrame.yStride, yPerRowOfPlane)
                if i < videoFrame.height / 2 {
                    memcpy(uvDestPlane!.advanced(by: i * yPerRowOfPlane / 2), uvPlane + i * videoFrame.yStride, yPerRowOfPlane)
                }
            }
        }
        
        return pixelBuffer
    }

暂时没有解决这个问题，可能需要声网协助

在 4.0.0 及以上的 AgoraRTCSDK 中则是直接提供了 CVPixelBuffer。

ijkplayer

由于 ijkplayer 是开源的，可以通过修改符号的方式解决依赖冲突的问题，且方便定制。风险则是可能需要专人维护。