且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

输入pcm样本计数不等于1024时如何使用ffmpeg-API将重新采样的PCM音频编码为AAC

更新时间:2022-04-21 22:08:41

我遇到了类似的问题.我将 PCM 数据包编码为 AAC ,而 PCM 数据包的长度有时小于 1024 .

I got a similar problem. I was encoding PCM packets to AAC while the length of PCM packets are sometimes smaller than 1024.

如果我对小于1024的数据包进行编码,则音频将为 slow .另一方面,如果我扔掉它,音频会更快.根据我的观察,swr_convert函数没有任何自动缓冲.

If I encode the packet that's smaller than 1024, the audio will be slow. On the other hand, if I throw it away, the audio will get faster. swr_convert function didn't have any automatic buffering from my observation.

我最终采用了一种缓冲区方案,即将数据包填充到 1024缓冲区,并且每次缓冲区满时都会对其进行编码清理.

I ended up with a buffer scheme that packets was filled to a 1024 buffer and the buffer gets encoded and cleaned everytime it's full.

填充缓冲区的功能如下:

The function to fill buffer is below:

// put frame data into buffer of fixed size
bool ffmpegHelper::putAudioBuffer(const AVFrame *pAvFrameIn, AVFrame **pAvFrameBuffer, AVCodecContext *dec_ctx, int frame_size, int &k0) {
  // prepare pFrameAudio
  if (!(*pAvFrameBuffer)) {
    if (!(*pAvFrameBuffer = av_frame_alloc())) {
      av_log(NULL, AV_LOG_ERROR, "Alloc frame failed\n");
      return false;
    } else {
      (*pAvFrameBuffer)->format = dec_ctx->sample_fmt;
      (*pAvFrameBuffer)->channels = dec_ctx->channels;
      (*pAvFrameBuffer)->sample_rate = dec_ctx->sample_rate;
      (*pAvFrameBuffer)->nb_samples = frame_size;
      int ret = av_frame_get_buffer(*pAvFrameBuffer, 0);
      if (ret < 0) {
        char err[500];
        av_log(NULL, AV_LOG_ERROR, "get audio buffer failed: %s\n",
          av_make_error_string(err, AV_ERROR_MAX_STRING_SIZE, ret));
        return false;
      }
      (*pAvFrameBuffer)->nb_samples = 0;
      (*pAvFrameBuffer)->pts = pAvFrameIn->pts;
    }
  }

  // copy input data to buffer
  int n_channels = pAvFrameIn->channels;
  int new_samples = min(pAvFrameIn->nb_samples - k0, frame_size - (*pAvFrameBuffer)->nb_samples);
  int k1 = (*pAvFrameBuffer)->nb_samples;

  if (pAvFrameIn->format == AV_SAMPLE_FMT_S16) {
    int16_t *d_in = (int16_t *)pAvFrameIn->data[0];
    d_in += n_channels * k0;
    int16_t *d_out = (int16_t *)(*pAvFrameBuffer)->data[0];
    d_out += n_channels * k1;

    for (int i = 0; i < new_samples; ++i) {
      for (int j = 0; j < pAvFrameIn->channels; ++j) {
        *d_out++ = *d_in++;
      }
    }
  } else {
    printf("not handled format for audio buffer\n");
    return false;
  }

  (*pAvFrameBuffer)->nb_samples += new_samples;
  k0 += new_samples;

  return true;
}

填充缓冲区和编码的循环如下:

And the loop for fill buffer and encode is below:

// transcoding needed
int got_frame;
AVMediaType stream_type;
// decode the packet (do it your self)
decodePacket(packet, dec_ctx, &pAvFrame_, got_frame);

if (enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
    ret = 0;
    // break audio packet down to buffer
    if (enc_ctx->frame_size > 0) {
        int k = 0;
        while (k < pAvFrame_->nb_samples) {
            if (!putAudioBuffer(pAvFrame_, &pFrameAudio_, dec_ctx, enc_ctx->frame_size, k))
                return false;
            if (pFrameAudio_->nb_samples == enc_ctx->frame_size) {
                // the buffer is full, encode it (do it yourself)
                ret = encodeFrame(pFrameAudio_, stream_index, got_frame, false);
                if (ret < 0)
                    return false;
                pFrameAudio_->pts += enc_ctx->frame_size;
                pFrameAudio_->nb_samples = 0;
            }
        }
    } else {
        ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
    }
} else {
    // encode packet directly
    ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
}