=====================================================

HEVC Source code analysis article list :

【 decode -libavcodec HEVC decoder 】

FFmpeg Of HEVC Decoder source code simple analysis : summary

FFmpeg Of HEVC Decoder source code simple analysis : Parser (Parser) part

FFmpeg Of HEVC Decoder source code simple analysis : Decoder backbone

FFmpeg Of HEVC Decoder source code simple analysis :CTU decode (CTU Decode) part -PU

FFmpeg Of HEVC Decoder source code simple analysis :CTU decode (CTU Decode) part -TU

FFmpeg Of HEVC Decoder source code simple analysis : Loop filtering (LoopFilter)

=====================================================

In this paper, FFmpeg Of libavcodec Medium HEVC Decoder CTU decode (CTU Decode) Part of the source code .FFmpeg Of HEVC Decoder calls hls_decode_entry() Function complete Slice Decoding work .hls_decode_entry() You call the hls_coding_quadtree() It's done CTU Decoding work . because CTU There's a lot of decoding , So split this part into two articles : An article records PU The decoding , Another article records TU decode . This paper records TU The decoding process of .

Function call graph

FFmpeg HEVC Decoder CTU decode (CTU Decoder) Part of it is in the whole HEVC The position in the decoder is shown in the figure below .

CTU decode (CTU Decoder) Part of the function call relationship is shown in the figure below .

As you can see from the diagram ,CTU The corresponding function of decoding module is hls_coding_quadtree(). This function is a recursive function , It can be parsed according to the syntax format of quadtree CTU And get one of them CU. For each CU Would call hls_coding_unit() decode .
hls_coding_unit() Would call hls_prediction_unit() Yes CU Medium PU To deal with .hls_prediction_unit() call luma_mc_uni() Motion compensation processing is performed on the brightness one-way prediction block , call chroma_mc_uni() Motion compensation processing is performed on the chrominance one-way prediction block , call luma_mc_bi() Motion compensation processing is performed on the brightness one-way prediction block .
hls_coding_unit() Would call hls_transform_tree() Yes CU Medium TU To deal with .hls_transform_tree() Is a recursive function , We can parse it according to the syntax format of quadtree and get the TU. For each of these TU Would call hls_transform_unit() decode .hls_transform_unit() Intra prediction is done , And call ff_hevc_hls_residual_coding() decode DCT Residual data .

hls_decode_entry()

hls_decode_entry() yes FFmpeg HEVC In decoder Slice The entry function for decoding . The definition of this function is as follows .

// Decode the entry function
static int hls_decode_entry(AVCodecContext *avctxt, void *isFilterThread)
{
HEVCContext *s = avctxt->priv_data;
//CTB Size
int ctb_size = 1 << s->sps->log2_ctb_size;
int more_data = 1;
int x_ctb = 0;
int y_ctb = 0;
int ctb_addr_ts = s->pps->ctb_addr_rs_to_ts[s->sh.slice_ctb_addr_rs];
if (!ctb_addr_ts && s->sh.dependent_slice_segment_flag) {
av_log(s->avctx, AV_LOG_ERROR, "Impossible initial tile.\n");
return AVERROR_INVALIDDATA;
}
if (s->sh.dependent_slice_segment_flag) {
int prev_rs = s->pps->ctb_addr_ts_to_rs[ctb_addr_ts - 1];
if (s->tab_slice_address[prev_rs] != s->sh.slice_addr) {
av_log(s->avctx, AV_LOG_ERROR, "Previous slice segment missing\n");
return AVERROR_INVALIDDATA;
}
}
while (more_data && ctb_addr_ts < s->sps->ctb_size) {
int ctb_addr_rs = s->pps->ctb_addr_ts_to_rs[ctb_addr_ts];
//CTB The location of x and y
x_ctb = (ctb_addr_rs % ((s->sps->width + ctb_size - 1) >> s->sps->log2_ctb_size)) << s->sps->log2_ctb_size;
y_ctb = (ctb_addr_rs / ((s->sps->width + ctb_size - 1) >> s->sps->log2_ctb_size)) << s->sps->log2_ctb_size;
// Initialize the surrounding parameters
hls_decode_neighbour(s, x_ctb, y_ctb, ctb_addr_ts);
// initialization CABAC
ff_hevc_cabac_init(s, ctb_addr_ts);
// Sample point adaptive compensation parameters
hls_sao_param(s, x_ctb >> s->sps->log2_ctb_size, y_ctb >> s->sps->log2_ctb_size);
s->deblock[ctb_addr_rs].beta_offset = s->sh.beta_offset;
s->deblock[ctb_addr_rs].tc_offset = s->sh.tc_offset;
s->filter_slice_edges[ctb_addr_rs] = s->sh.slice_loop_filter_across_slices_enabled_flag;
/*
* CU Sketch Map
*
* 64x64 block
*
* depth d=0
* split_flag=1 Time is divided into 4 individual 32x32
*
* +--------+--------+--------+--------+--------+--------+--------+--------+
* | |
* | | |
* | |
* + | +
* | |
* | | |
* | |
* + | +
* | |
* | | |
* | |
* + | +
* | |
* | | |
* | |
* + -- -- -- -- -- -- -- -- --+ -- -- -- -- -- -- -- -- --+
* | | |
* | |
* | | |
* + +
* | | |
* | |
* | | |
* + +
* | | |
* | |
* | | |
* + +
* | | |
* | |
* | | |
* +--------+--------+--------+--------+--------+--------+--------+--------+
*
*
* 32x32 block
* depth d=1
* split_flag=1 Time is divided into 4 individual 16x16
*
* +--------+--------+--------+--------+
* | |
* | | |
* | |
* + | +
* | |
* | | |
* | |
* + -- -- -- -- + -- -- -- -- +
* | |
* | | |
* | |
* + | +
* | |
* | | |
* | |
* +--------+--------+--------+--------+
*
*
* 16x16 block
* depth d=2
* split_flag=1 Time is divided into 4 individual 8x8
*
* +--------+--------+
* | |
* | | |
* | |
* + -- --+ -- -- +
* | |
* | | |
* | |
* +--------+--------+
*
*
* 8x8 block
* depth d=3
* split_flag=1 Time is divided into 4 individual 4x4
*
* +----+----+
* | | |
* + -- + -- +
* | | |
* +----+----+
*
*/
/*
* Analyze the quadtree structure , And decode
*
* hls_coding_quadtree(HEVCContext *s, int x0, int y0, int log2_cb_size, int cb_depth) in :
* s:HEVCContext Context structure
* x_ctb:CB Positional x coordinate
* y_ctb:CB Positional y coordinate
* log2_cb_size:CB Size log2 The value after
* cb_depth: depth
*
*/
more_data = hls_coding_quadtree(s, x_ctb, y_ctb, s->sps->log2_ctb_size, 0);
if (more_data < 0) {
s->tab_slice_address[ctb_addr_rs] = -1;
return more_data;
}
ctb_addr_ts++;
// Save decoded information for next use
ff_hevc_save_states(s, ctb_addr_ts);
// Deblocking filtering
ff_hevc_hls_filters(s, x_ctb, y_ctb, ctb_size);
}
if (x_ctb + ctb_size >= s->sps->width &&
y_ctb + ctb_size >= s->sps->height)
ff_hevc_hls_filter(s, x_ctb, y_ctb, ctb_size);
return ctb_addr_ts;
}

You can see from the source code ,hls_decode_entry() The main call 2 A function to decode :

(1) call hls_coding_quadtree() decode CTU. Which includes PU and TU The decoding .
(2) call ff_hevc_hls_filters() Filtering . It includes deblocking filtering and SAO wave filtering .

This paper analyzes the first step CTU The decoding process .

hls_coding_quadtree()

hls_coding_quadtree() For parsing CTU The syntactic structure of quadtree . The definition of this function is as follows .

/*
* Analyze the quadtree structure , And decode
* Note that this function is called recursively
* Annotation and processing : LeiXiaoHua
*
*
* s:HEVCContext Context structure
* x_ctb:CB Positional x coordinate
* y_ctb:CB Positional y coordinate
* log2_cb_size:CB Size log2 The value after
* cb_depth: depth
*
*/
static int hls_coding_quadtree(HEVCContext *s, int x0, int y0,
int log2_cb_size, int cb_depth)
{
HEVCLocalContext *lc = s->HEVClc;
//CB Size ,split flag=0
//log2_cb_size by CB Size log Later results
const int cb_size = 1 << log2_cb_size;
int ret;
int qp_block_mask = (1<<(s->sps->log2_ctb_size - s->pps->diff_cu_qp_delta_depth)) - 1;
int split_cu;
lc->ct_depth = cb_depth;
if (x0 + cb_size <= s->sps->width &&
y0 + cb_size <= s->sps->height &&
log2_cb_size > s->sps->log2_min_cb_size) {
split_cu = ff_hevc_split_coding_unit_flag_decode(s, cb_depth, x0, y0);
} else {
split_cu = (log2_cb_size > s->sps->log2_min_cb_size);
}
if (s->pps->cu_qp_delta_enabled_flag &&
log2_cb_size >= s->sps->log2_ctb_size - s->pps->diff_cu_qp_delta_depth) {
lc->tu.is_cu_qp_delta_coded = 0;
lc->tu.cu_qp_delta = 0;
}
if (s->sh.cu_chroma_qp_offset_enabled_flag &&
log2_cb_size >= s->sps->log2_ctb_size - s->pps->diff_cu_chroma_qp_offset_depth) {
lc->tu.is_cu_chroma_qp_offset_coded = 0;
}
if (split_cu) {
// If CU We can continue to divide , Then continue to analyze the divided CU
// Note that this is a recursive call
//CB Size ,split flag=1
const int cb_size_split = cb_size >> 1;
/*
* (x0, y0) (x1, y0)
* +--------+--------+
* | |
* | | |
* | |
* + -- --+ -- -- +
* (x0, y1) (x1, y1) |
* | | |
* | |
* +--------+--------+
*
*/
const int x1 = x0 + cb_size_split;
const int y1 = y0 + cb_size_split;
int more_data = 0;
// Be careful :
//CU Half the size ,log2_cb_size-1
// depth d Add 1,cb_depth+1
more_data = hls_coding_quadtree(s, x0, y0, log2_cb_size - 1, cb_depth + 1);
if (more_data < 0)
return more_data;
if (more_data && x1 < s->sps->width) {
more_data = hls_coding_quadtree(s, x1, y0, log2_cb_size - 1, cb_depth + 1);
if (more_data < 0)
return more_data;
}
if (more_data && y1 < s->sps->height) {
more_data = hls_coding_quadtree(s, x0, y1, log2_cb_size - 1, cb_depth + 1);
if (more_data < 0)
return more_data;
}
if (more_data && x1 < s->sps->width &&
y1 < s->sps->height) {
more_data = hls_coding_quadtree(s, x1, y1, log2_cb_size - 1, cb_depth + 1);
if (more_data < 0)
return more_data;
}
if(((x0 + (1<<log2_cb_size)) & qp_block_mask) == 0 &&
((y0 + (1<<log2_cb_size)) & qp_block_mask) == 0)
lc->qPy_pred = lc->qp_y;
if (more_data)
return ((x1 + cb_size_split) < s->sps->width ||
(y1 + cb_size_split) < s->sps->height);
else
return 0;
} else {
/*
* (x0, y0)
* +--------+--------+
* | |
* | |
* | |
* + +
* | |
* | |
* | |
* +--------+--------+
*
*/
// Note that what is dealt with is not divisible CU unit
// Handle CU unit - Real decoding
ret = hls_coding_unit(s, x0, y0, log2_cb_size);
if (ret < 0)
return ret;
if ((!((x0 + cb_size) %
(1 << (s->sps->log2_ctb_size))) ||
(x0 + cb_size >= s->sps->width)) &&
(!((y0 + cb_size) %
(1 << (s->sps->log2_ctb_size))) ||
(y0 + cb_size >= s->sps->height))) {
int end_of_slice_flag = ff_hevc_end_of_slice_flag_decode(s);
return !end_of_slice_flag;
} else {
return 1;
}
}
return 0;
}

You can see from the source code ,hls_coding_quadtree() First call ff_hevc_split_coding_unit_flag_decode() Judge the present CU Whether it is necessary to divide . If it needs to be divided , Will call recursively 4 Time hls_coding_quadtree() Respectively for 4 Sub block continues quadtree parsing ; If you don't need to divide , Will call hls_coding_unit() Yes CU decode . To make a long story short ,hls_coding_quadtree() It'll work out a CTU All in CU, And for every CU Call one by one hls_coding_unit() decode . One CTU in CU The decoding sequence is shown in the figure below . In the figure a, b, c … That is to say, it represents the order of events .

hls_coding_unit()

hls_coding_unit() Used to decode a CU. The definition of this function is as follows .

// Handle CU unit - Real decoding
// Annotation and processing : LeiXiaoHua
static int hls_coding_unit(HEVCContext *s, int x0, int y0, int log2_cb_size)
{
//CB size
int cb_size = 1 << log2_cb_size;
HEVCLocalContext *lc = s->HEVClc;
int log2_min_cb_size = s->sps->log2_min_cb_size;
int length = cb_size >> log2_min_cb_size;
int min_cb_width = s->sps->min_cb_width;
// With the smallest CB In units of ( for example 4x4) When , At present CB The location of ——x Coordinates and y coordinate
int x_cb = x0 >> log2_min_cb_size;
int y_cb = y0 >> log2_min_cb_size;
int idx = log2_cb_size - 2;
int qp_block_mask = (1<<(s->sps->log2_ctb_size - s->pps->diff_cu_qp_delta_depth)) - 1;
int x, y, ret;
// Set up CU The attribute value
lc->cu.x = x0;
lc->cu.y = y0;
lc->cu.pred_mode = MODE_INTRA;
lc->cu.part_mode = PART_2Nx2N;
lc->cu.intra_split_flag = 0;
SAMPLE_CTB(s->skip_flag, x_cb, y_cb) = 0;
for (x = 0; x < 4; x++)
lc->pu.intra_pred_mode[x] = 1;
if (s->pps->transquant_bypass_enable_flag) {
lc->cu.cu_transquant_bypass_flag = ff_hevc_cu_transquant_bypass_flag_decode(s);
if (lc->cu.cu_transquant_bypass_flag)
set_deblocking_bypass(s, x0, y0, log2_cb_size);
} else
lc->cu.cu_transquant_bypass_flag = 0;
if (s->sh.slice_type != I_SLICE) {
//Skip type
uint8_t skip_flag = ff_hevc_skip_flag_decode(s, x0, y0, x_cb, y_cb);
// Set to skip_flag In cache
x = y_cb * min_cb_width + x_cb;
for (y = 0; y < length; y++) {
memset(&s->skip_flag[x], skip_flag, length);
x += min_cb_width;
}
lc->cu.pred_mode = skip_flag ? MODE_SKIP : MODE_INTER;
} else {
x = y_cb * min_cb_width + x_cb;
for (y = 0; y < length; y++) {
memset(&s->skip_flag[x], 0, length);
x += min_cb_width;
}
}
if (SAMPLE_CTB(s->skip_flag, x_cb, y_cb)) {
hls_prediction_unit(s, x0, y0, cb_size, cb_size, log2_cb_size, 0, idx);
intra_prediction_unit_default_value(s, x0, y0, log2_cb_size);
if (!s->sh.disable_deblocking_filter_flag)
ff_hevc_deblocking_boundary_strengths(s, x0, y0, log2_cb_size);
} else {
int pcm_flag = 0;
// Read prediction mode ( Not I Slice)
if (s->sh.slice_type != I_SLICE)
lc->cu.pred_mode = ff_hevc_pred_mode_decode(s);
// Not in intra prediction mode
// Or it's already the smallest CB When
if (lc->cu.pred_mode != MODE_INTRA ||
log2_cb_size == s->sps->log2_min_cb_size) {
// Read CU Split mode
lc->cu.part_mode = ff_hevc_part_mode_decode(s, log2_cb_size);
lc->cu.intra_split_flag = lc->cu.part_mode == PART_NxN &&
lc->cu.pred_mode == MODE_INTRA;
}
if (lc->cu.pred_mode == MODE_INTRA) {
// Intra prediction mode
//PCM Way code , Less common
if (lc->cu.part_mode == PART_2Nx2N && s->sps->pcm_enabled_flag &&
log2_cb_size >= s->sps->pcm.log2_min_pcm_cb_size &&
log2_cb_size <= s->sps->pcm.log2_max_pcm_cb_size) {
pcm_flag = ff_hevc_pcm_flag_decode(s);
}
if (pcm_flag) {
intra_prediction_unit_default_value(s, x0, y0, log2_cb_size);
ret = hls_pcm_sample(s, x0, y0, log2_cb_size);
if (s->sps->pcm.loop_filter_disable_flag)
set_deblocking_bypass(s, x0, y0, log2_cb_size);
if (ret < 0)
return ret;
} else {
// Intra prediction
intra_prediction_unit(s, x0, y0, log2_cb_size);
}
} else {
// Inter prediction mode
intra_prediction_unit_default_value(s, x0, y0, log2_cb_size);
// There are a total of 8 There are two patterns of division
switch (lc->cu.part_mode) {
case PART_2Nx2N:
/*
* PART_2Nx2N:
* +--------+--------+
* | |
* | |
* | |
* + + +
* | |
* | |
* | |
* +--------+--------+
*/
// Handle PU unit - Motion compensation
hls_prediction_unit(s, x0, y0, cb_size, cb_size, log2_cb_size, 0, idx);
break;
case PART_2NxN:
/*
* PART_2NxN:
* +--------+--------+
* | |
* | |
* | |
* +--------+--------+
* | |
* | |
* | |
* +--------+--------+
*
*/
/*
* hls_prediction_unit() Parameters :
* x0 : PU top left corner x coordinate
* y0 : PU top left corner y coordinate
* nPbW : PU Width
* nPbH : PU Height
* log2_cb_size : CB Size log2() Value
* partIdx : PU The index number of - Divide into 4 When I get a block, I take 0-3, When it's divided into two pieces, take 0 and 1
*/
// On
hls_prediction_unit(s, x0, y0, cb_size, cb_size / 2, log2_cb_size, 0, idx);
// Next
hls_prediction_unit(s, x0, y0 + cb_size / 2, cb_size, cb_size / 2, log2_cb_size, 1, idx);
break;
case PART_Nx2N:
/*
* PART_Nx2N:
* +--------+--------+
* | | |
* | | |
* | | |
* + + +
* | | |
* | | |
* | | |
* +--------+--------+
*
*/
// Left
hls_prediction_unit(s, x0, y0, cb_size / 2, cb_size, log2_cb_size, 0, idx - 1);
// Right
hls_prediction_unit(s, x0 + cb_size / 2, y0, cb_size / 2, cb_size, log2_cb_size, 1, idx - 1);
break;
case PART_2NxnU:
/*
* PART_2NxnU (Upper) :
* +--------+--------+
* | |
* +--------+--------+
* | |
* + + +
* | |
* | |
* | |
* +--------+--------+
*
*/
// On
hls_prediction_unit(s, x0, y0, cb_size, cb_size / 4, log2_cb_size, 0, idx);
// Next
hls_prediction_unit(s, x0, y0 + cb_size / 4, cb_size, cb_size * 3 / 4, log2_cb_size, 1, idx);
break;
case PART_2NxnD:
/*
* PART_2NxnD (Down) :
* +--------+--------+
* | |
* | |
* | |
* + + +
* | |
* +--------+--------+
* | |
* +--------+--------+
*
*/
// On
hls_prediction_unit(s, x0, y0, cb_size, cb_size * 3 / 4, log2_cb_size, 0, idx);
// Next
hls_prediction_unit(s, x0, y0 + cb_size * 3 / 4, cb_size, cb_size / 4, log2_cb_size, 1, idx);
break;
case PART_nLx2N:
/*
* PART_nLx2N (Left):
* +----+---+--------+
* | | |
* | | |
* | | |
* + + + +
* | | |
* | | |
* | | |
* +----+---+--------+
*
*/
// Left
hls_prediction_unit(s, x0, y0, cb_size / 4, cb_size, log2_cb_size, 0, idx - 2);
// Right
hls_prediction_unit(s, x0 + cb_size / 4, y0, cb_size * 3 / 4, cb_size, log2_cb_size, 1, idx - 2);
break;
case PART_nRx2N:
/*
* PART_nRx2N (Right):
* +--------+---+----+
* | | |
* | | |
* | | |
* + + + +
* | | |
* | | |
* | | |
* +--------+---+----+
*
*/
// Left
hls_prediction_unit(s, x0, y0, cb_size * 3 / 4, cb_size, log2_cb_size, 0, idx - 2);
// Right
hls_prediction_unit(s, x0 + cb_size * 3 / 4, y0, cb_size / 4, cb_size, log2_cb_size, 1, idx - 2);
break;
case PART_NxN:
/*
* PART_NxN:
* +--------+--------+
* | | |
* | | |
* | | |
* +--------+--------+
* | | |
* | | |
* | | |
* +--------+--------+
*
*/
hls_prediction_unit(s, x0, y0, cb_size / 2, cb_size / 2, log2_cb_size, 0, idx - 1);
hls_prediction_unit(s, x0 + cb_size / 2, y0, cb_size / 2, cb_size / 2, log2_cb_size, 1, idx - 1);
hls_prediction_unit(s, x0, y0 + cb_size / 2, cb_size / 2, cb_size / 2, log2_cb_size, 2, idx - 1);
hls_prediction_unit(s, x0 + cb_size / 2, y0 + cb_size / 2, cb_size / 2, cb_size / 2, log2_cb_size, 3, idx - 1);
break;
}
}
if (!pcm_flag) {
int rqt_root_cbf = 1;
if (lc->cu.pred_mode != MODE_INTRA &&
!(lc->cu.part_mode == PART_2Nx2N && lc->pu.merge_flag)) {
rqt_root_cbf = ff_hevc_no_residual_syntax_flag_decode(s);
}
if (rqt_root_cbf) {
const static int cbf[2] = { 0 };
lc->cu.max_trafo_depth = lc->cu.pred_mode == MODE_INTRA ?
s->sps->max_transform_hierarchy_depth_intra + lc->cu.intra_split_flag :
s->sps->max_transform_hierarchy_depth_inter;
// Handle TU quadtree
ret = hls_transform_tree(s, x0, y0, x0, y0, x0, y0,
log2_cb_size,
log2_cb_size, 0, 0, cbf, cbf);
if (ret < 0)
return ret;
} else {
if (!s->sh.disable_deblocking_filter_flag)
ff_hevc_deblocking_boundary_strengths(s, x0, y0, log2_cb_size);
}
}
}
if (s->pps->cu_qp_delta_enabled_flag && lc->tu.is_cu_qp_delta_coded == 0)
ff_hevc_set_qPy(s, x0, y0, log2_cb_size);
x = y_cb * min_cb_width + x_cb;
for (y = 0; y < length; y++) {
memset(&s->qp_y_tab[x], lc->qp_y, length);
x += min_cb_width;
}
if(((x0 + (1<<log2_cb_size)) & qp_block_mask) == 0 &&
((y0 + (1<<log2_cb_size)) & qp_block_mask) == 0) {
lc->qPy_pred = lc->qp_y;
}
set_ct_depth(s, x0, y0, log2_cb_size, lc->ct_depth);
return 0;
}

You can see from the source code ,hls_coding_unit() It mainly deals with two aspects :

(1) call hls_prediction_unit() Handle PU.
(2) call hls_transform_tree() Handle TU Trees .

This paper analyzes the second function hls_transform_tree() Related code in .

hls_transform_tree()

hls_transform_tree() For parsing TU Quadtree Syntax . The definition of this function is as follows .

// Handle TU quadtree
static int hls_transform_tree(HEVCContext *s, int x0, int y0,
int xBase, int yBase, int cb_xBase, int cb_yBase,
int log2_cb_size, int log2_trafo_size,
int trafo_depth, int blk_idx,
const int *base_cbf_cb, const int *base_cbf_cr)
{
HEVCLocalContext *lc = s->HEVClc;
uint8_t split_transform_flag;
int cbf_cb[2];
int cbf_cr[2];
int ret;
cbf_cb[0] = base_cbf_cb[0];
cbf_cb[1] = base_cbf_cb[1];
cbf_cr[0] = base_cbf_cr[0];
cbf_cr[1] = base_cbf_cr[1];
if (lc->cu.intra_split_flag) {
if (trafo_depth == 1) {
lc->tu.intra_pred_mode = lc->pu.intra_pred_mode[blk_idx];
if (s->sps->chroma_format_idc == 3) {
lc->tu.intra_pred_mode_c = lc->pu.intra_pred_mode_c[blk_idx];
lc->tu.chroma_mode_c = lc->pu.chroma_mode_c[blk_idx];
} else {
lc->tu.intra_pred_mode_c = lc->pu.intra_pred_mode_c[0];
lc->tu.chroma_mode_c = lc->pu.chroma_mode_c[0];
}
}
} else {
lc->tu.intra_pred_mode = lc->pu.intra_pred_mode[0];
lc->tu.intra_pred_mode_c = lc->pu.intra_pred_mode_c[0];
lc->tu.chroma_mode_c = lc->pu.chroma_mode_c[0];
}
if (log2_trafo_size <= s->sps->log2_max_trafo_size &&
log2_trafo_size > s->sps->log2_min_tb_size &&
trafo_depth < lc->cu.max_trafo_depth &&
!(lc->cu.intra_split_flag && trafo_depth == 0)) {
split_transform_flag = ff_hevc_split_transform_flag_decode(s, log2_trafo_size);
} else {
int inter_split = s->sps->max_transform_hierarchy_depth_inter == 0 &&
lc->cu.pred_mode == MODE_INTER &&
lc->cu.part_mode != PART_2Nx2N &&
trafo_depth == 0;
//split_transform_flag Mark current TU Whether to divide the quadtree
// by 1 It needs to be divided into 4 Equal size , by 0 No more division
split_transform_flag = log2_trafo_size > s->sps->log2_max_trafo_size ||
(lc->cu.intra_split_flag && trafo_depth == 0) ||
inter_split;
}
if (log2_trafo_size > 2 || s->sps->chroma_format_idc == 3) {
if (trafo_depth == 0 || cbf_cb[0]) {
cbf_cb[0] = ff_hevc_cbf_cb_cr_decode(s, trafo_depth);
if (s->sps->chroma_format_idc == 2 && (!split_transform_flag || log2_trafo_size == 3)) {
cbf_cb[1] = ff_hevc_cbf_cb_cr_decode(s, trafo_depth);
}
}
if (trafo_depth == 0 || cbf_cr[0]) {
cbf_cr[0] = ff_hevc_cbf_cb_cr_decode(s, trafo_depth);
if (s->sps->chroma_format_idc == 2 && (!split_transform_flag || log2_trafo_size == 3)) {
cbf_cr[1] = ff_hevc_cbf_cb_cr_decode(s, trafo_depth);
}
}
}
// If at present TU To divide a quadtree
if (split_transform_flag) {
const int trafo_size_split = 1 << (log2_trafo_size - 1);
const int x1 = x0 + trafo_size_split;
const int y1 = y0 + trafo_size_split;
#define SUBDIVIDE(x, y, idx) \
do { \
ret = hls_transform_tree(s, x, y, x0, y0, cb_xBase, cb_yBase, log2_cb_size, \
log2_trafo_size - 1, trafo_depth + 1, idx, \
cbf_cb, cbf_cr); \
if (ret < 0) \
return ret; \
} while (0)
// Recursively call
SUBDIVIDE(x0, y0, 0);
SUBDIVIDE(x1, y0, 1);
SUBDIVIDE(x0, y1, 2);
SUBDIVIDE(x1, y1, 3);
#undef SUBDIVIDE
} else {
int min_tu_size = 1 << s->sps->log2_min_tb_size;
int log2_min_tu_size = s->sps->log2_min_tb_size;
int min_tu_width = s->sps->min_tb_width;
int cbf_luma = 1;
if (lc->cu.pred_mode == MODE_INTRA || trafo_depth != 0 ||
cbf_cb[0] || cbf_cr[0] ||
(s->sps->chroma_format_idc == 2 && (cbf_cb[1] || cbf_cr[1]))) {
cbf_luma = ff_hevc_cbf_luma_decode(s, trafo_depth);
}
// Handle TU- Intra prediction 、DCT Reverse transformation
ret = hls_transform_unit(s, x0, y0, xBase, yBase, cb_xBase, cb_yBase,
log2_cb_size, log2_trafo_size,
blk_idx, cbf_luma, cbf_cb, cbf_cr);
if (ret < 0)
return ret;
// TODO: store cbf_luma somewhere else
if (cbf_luma) {
int i, j;
for (i = 0; i < (1 << log2_trafo_size); i += min_tu_size)
for (j = 0; j < (1 << log2_trafo_size); j += min_tu_size) {
int x_tu = (x0 + j) >> log2_min_tu_size;
int y_tu = (y0 + i) >> log2_min_tu_size;
s->cbf_luma[y_tu * min_tu_width + x_tu] = 1;
}
}
if (!s->sh.disable_deblocking_filter_flag) {
ff_hevc_deblocking_boundary_strengths(s, x0, y0, log2_trafo_size);
if (s->pps->transquant_bypass_enable_flag &&
lc->cu.cu_transquant_bypass_flag)
set_deblocking_bypass(s, x0, y0, log2_trafo_size);
}
}
return 0;
}

You can see from the source code ,hls_transform_tree() First call ff_hevc_split_transform_flag_decode() Judge the present TU Whether it is necessary to divide . If it needs to be divided , Will call recursively 4 Time hls_transform_tree() Respectively for 4 Sub block continues quadtree parsing ; If you don't need to divide , Will call hls_transform_unit() Yes TU decode . To make a long story short ,hls_transform_tree() It'll work out a TU All in the tree TU, And for every TU Call one by one hls_transform_unit() decode .

hls_transform_unit()

hls_transform_unit() Used to decode a TU, The definition of this function is as follows .

// Handle TU- Intra prediction 、DCT Reverse transformation
static int hls_transform_unit(HEVCContext *s, int x0, int y0,
int xBase, int yBase, int cb_xBase, int cb_yBase,
int log2_cb_size, int log2_trafo_size,
int blk_idx, int cbf_luma, int *cbf_cb, int *cbf_cr)
{
HEVCLocalContext *lc = s->HEVClc;
const int log2_trafo_size_c = log2_trafo_size - s->sps->hshift[1];
int i;
if (lc->cu.pred_mode == MODE_INTRA) {
int trafo_size = 1 << log2_trafo_size;
ff_hevc_set_neighbour_available(s, x0, y0, trafo_size, trafo_size);
// Be careful : Intra prediction is also done here
// Intra prediction
//log2_trafo_size For the current TU Size log2() The value after
s->hpc.intra_pred[log2_trafo_size - 2](s, x0, y0, 0);
}
if (cbf_luma || cbf_cb[0] || cbf_cr[0] ||
(s->sps->chroma_format_idc == 2 && (cbf_cb[1] || cbf_cr[1]))) {
int scan_idx = SCAN_DIAG;
int scan_idx_c = SCAN_DIAG;
int cbf_chroma = cbf_cb[0] || cbf_cr[0] ||
(s->sps->chroma_format_idc == 2 &&
(cbf_cb[1] || cbf_cr[1]));
if (s->pps->cu_qp_delta_enabled_flag && !lc->tu.is_cu_qp_delta_coded) {
lc->tu.cu_qp_delta = ff_hevc_cu_qp_delta_abs(s);
if (lc->tu.cu_qp_delta != 0)
if (ff_hevc_cu_qp_delta_sign_flag(s) == 1)
lc->tu.cu_qp_delta = -lc->tu.cu_qp_delta;
lc->tu.is_cu_qp_delta_coded = 1;
if (lc->tu.cu_qp_delta < -(26 + s->sps->qp_bd_offset / 2) ||
lc->tu.cu_qp_delta > (25 + s->sps->qp_bd_offset / 2)) {
av_log(s->avctx, AV_LOG_ERROR,
"The cu_qp_delta %d is outside the valid range "
"[%d, %d].\n",
lc->tu.cu_qp_delta,
-(26 + s->sps->qp_bd_offset / 2),
(25 + s->sps->qp_bd_offset / 2));
return AVERROR_INVALIDDATA;
}
ff_hevc_set_qPy(s, cb_xBase, cb_yBase, log2_cb_size);
}
if (s->sh.cu_chroma_qp_offset_enabled_flag && cbf_chroma &&
!lc->cu.cu_transquant_bypass_flag && !lc->tu.is_cu_chroma_qp_offset_coded) {
int cu_chroma_qp_offset_flag = ff_hevc_cu_chroma_qp_offset_flag(s);
if (cu_chroma_qp_offset_flag) {
int cu_chroma_qp_offset_idx = 0;
if (s->pps->chroma_qp_offset_list_len_minus1 > 0) {
cu_chroma_qp_offset_idx = ff_hevc_cu_chroma_qp_offset_idx(s);
av_log(s->avctx, AV_LOG_ERROR,
"cu_chroma_qp_offset_idx not yet tested.\n");
}
lc->tu.cu_qp_offset_cb = s->pps->cb_qp_offset_list[cu_chroma_qp_offset_idx];
lc->tu.cu_qp_offset_cr = s->pps->cr_qp_offset_list[cu_chroma_qp_offset_idx];
} else {
lc->tu.cu_qp_offset_cb = 0;
lc->tu.cu_qp_offset_cr = 0;
}
lc->tu.is_cu_chroma_qp_offset_coded = 1;
}
if (lc->cu.pred_mode == MODE_INTRA && log2_trafo_size < 4) {
if (lc->tu.intra_pred_mode >= 6 &&
lc->tu.intra_pred_mode <= 14) {
scan_idx = SCAN_VERT;
} else if (lc->tu.intra_pred_mode >= 22 &&
lc->tu.intra_pred_mode <= 30) {
scan_idx = SCAN_HORIZ;
}
if (lc->tu.intra_pred_mode_c >= 6 &&
lc->tu.intra_pred_mode_c <= 14) {
scan_idx_c = SCAN_VERT;
} else if (lc->tu.intra_pred_mode_c >= 22 &&
lc->tu.intra_pred_mode_c <= 30) {
scan_idx_c = SCAN_HORIZ;
}
}
lc->tu.cross_pf = 0;
// Read residual data , Inverse quantization ,DCT Reverse transformation
// brightness Y
if (cbf_luma)
ff_hevc_hls_residual_coding(s, x0, y0, log2_trafo_size, scan_idx, 0);// Last 1 The first parameter is the color component number
if (log2_trafo_size > 2 || s->sps->chroma_format_idc == 3) {
int trafo_size_h = 1 << (log2_trafo_size_c + s->sps->hshift[1]);
int trafo_size_v = 1 << (log2_trafo_size_c + s->sps->vshift[1]);
lc->tu.cross_pf = (s->pps->cross_component_prediction_enabled_flag && cbf_luma &&
(lc->cu.pred_mode == MODE_INTER ||
(lc->tu.chroma_mode_c == 4)));
if (lc->tu.cross_pf) {
hls_cross_component_pred(s, 0);
}
// chroma U
for (i = 0; i < (s->sps->chroma_format_idc == 2 ? 2 : 1); i++) {
if (lc->cu.pred_mode == MODE_INTRA) {
ff_hevc_set_neighbour_available(s, x0, y0 + (i << log2_trafo_size_c), trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size_c - 2](s, x0, y0 + (i << log2_trafo_size_c), 1);
}
if (cbf_cb[i])
ff_hevc_hls_residual_coding(s, x0, y0 + (i << log2_trafo_size_c),
log2_trafo_size_c, scan_idx_c, 1);// Last 1 The first parameter is the color component number
else
if (lc->tu.cross_pf) {
ptrdiff_t stride = s->frame->linesize[1];
int hshift = s->sps->hshift[1];
int vshift = s->sps->vshift[1];
int16_t *coeffs_y = (int16_t*)lc->edge_emu_buffer;
int16_t *coeffs = (int16_t*)lc->edge_emu_buffer2;
int size = 1 << log2_trafo_size_c;
uint8_t *dst = &s->frame->data[1][(y0 >> vshift) * stride +
((x0 >> hshift) << s->sps->pixel_shift)];
for (i = 0; i < (size * size); i++) {
coeffs[i] = ((lc->tu.res_scale_val * coeffs_y[i]) >> 3);
}
// Stack residual data
s->hevcdsp.transform_add[log2_trafo_size_c-2](dst, coeffs, stride);
}
}
if (lc->tu.cross_pf) {
hls_cross_component_pred(s, 1);
}
// chroma V
for (i = 0; i < (s->sps->chroma_format_idc == 2 ? 2 : 1); i++) {
if (lc->cu.pred_mode == MODE_INTRA) {
ff_hevc_set_neighbour_available(s, x0, y0 + (i << log2_trafo_size_c), trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size_c - 2](s, x0, y0 + (i << log2_trafo_size_c), 2);
}
// chroma Cr
if (cbf_cr[i])
ff_hevc_hls_residual_coding(s, x0, y0 + (i << log2_trafo_size_c),
log2_trafo_size_c, scan_idx_c, 2);
else
if (lc->tu.cross_pf) {
ptrdiff_t stride = s->frame->linesize[2];
int hshift = s->sps->hshift[2];
int vshift = s->sps->vshift[2];
int16_t *coeffs_y = (int16_t*)lc->edge_emu_buffer;
int16_t *coeffs = (int16_t*)lc->edge_emu_buffer2;
int size = 1 << log2_trafo_size_c;
uint8_t *dst = &s->frame->data[2][(y0 >> vshift) * stride +
((x0 >> hshift) << s->sps->pixel_shift)];
for (i = 0; i < (size * size); i++) {
coeffs[i] = ((lc->tu.res_scale_val * coeffs_y[i]) >> 3);
}
s->hevcdsp.transform_add[log2_trafo_size_c-2](dst, coeffs, stride);
}
}
} else if (blk_idx == 3) {
int trafo_size_h = 1 << (log2_trafo_size + 1);
int trafo_size_v = 1 << (log2_trafo_size + s->sps->vshift[1]);
for (i = 0; i < (s->sps->chroma_format_idc == 2 ? 2 : 1); i++) {
if (lc->cu.pred_mode == MODE_INTRA) {
ff_hevc_set_neighbour_available(s, xBase, yBase + (i << log2_trafo_size),
trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size - 2](s, xBase, yBase + (i << log2_trafo_size), 1);
}
if (cbf_cb[i])
ff_hevc_hls_residual_coding(s, xBase, yBase + (i << log2_trafo_size),
log2_trafo_size, scan_idx_c, 1);
}
for (i = 0; i < (s->sps->chroma_format_idc == 2 ? 2 : 1); i++) {
if (lc->cu.pred_mode == MODE_INTRA) {
ff_hevc_set_neighbour_available(s, xBase, yBase + (i << log2_trafo_size),
trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size - 2](s, xBase, yBase + (i << log2_trafo_size), 2);
}
if (cbf_cr[i])
ff_hevc_hls_residual_coding(s, xBase, yBase + (i << log2_trafo_size),
log2_trafo_size, scan_idx_c, 2);
}
}
} else if (lc->cu.pred_mode == MODE_INTRA) {
if (log2_trafo_size > 2 || s->sps->chroma_format_idc == 3) {
int trafo_size_h = 1 << (log2_trafo_size_c + s->sps->hshift[1]);
int trafo_size_v = 1 << (log2_trafo_size_c + s->sps->vshift[1]);
ff_hevc_set_neighbour_available(s, x0, y0, trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size_c - 2](s, x0, y0, 1);
s->hpc.intra_pred[log2_trafo_size_c - 2](s, x0, y0, 2);
if (s->sps->chroma_format_idc == 2) {
ff_hevc_set_neighbour_available(s, x0, y0 + (1 << log2_trafo_size_c),
trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size_c - 2](s, x0, y0 + (1 << log2_trafo_size_c), 1);
s->hpc.intra_pred[log2_trafo_size_c - 2](s, x0, y0 + (1 << log2_trafo_size_c), 2);
}
} else if (blk_idx == 3) {
int trafo_size_h = 1 << (log2_trafo_size + 1);
int trafo_size_v = 1 << (log2_trafo_size + s->sps->vshift[1]);
ff_hevc_set_neighbour_available(s, xBase, yBase,
trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size - 2](s, xBase, yBase, 1);
s->hpc.intra_pred[log2_trafo_size - 2](s, xBase, yBase, 2);
if (s->sps->chroma_format_idc == 2) {
ff_hevc_set_neighbour_available(s, xBase, yBase + (1 << (log2_trafo_size)),
trafo_size_h, trafo_size_v);
s->hpc.intra_pred[log2_trafo_size - 2](s, xBase, yBase + (1 << (log2_trafo_size)), 1);
s->hpc.intra_pred[log2_trafo_size - 2](s, xBase, yBase + (1 << (log2_trafo_size)), 2);
}
}
}
return 0;
}

You can see from the source code , If it's intra CU Words ,hls_transform_unit() Would call HEVCPredContext Of intra_pred[]() Assembly function for intra prediction ; And then whether intra prediction or inter prediction CU Will be called ff_hevc_hls_residual_coding() Decode the residual data , And superimposed on the forecast data .

ff_hevc_hls_residual_coding()

ff_hevc_hls_residual_coding() Used to read residual data and perform DCT Reverse transformation . The definition of this function is as follows .

// Read residual data ,DCT Reverse transformation
void ff_hevc_hls_residual_coding(HEVCContext *s, int x0, int y0,
int log2_trafo_size, enum ScanType scan_idx,
int c_idx)
{
#define GET_COORD(offset, n) \
do { \
x_c = (x_cg << 2) + scan_x_off[n]; \
y_c = (y_cg << 2) + scan_y_off[n]; \
} while (0)
HEVCLocalContext *lc = s->HEVClc;
int transform_skip_flag = 0;
int last_significant_coeff_x, last_significant_coeff_y;
int last_scan_pos;
int n_end;
int num_coeff = 0;
int greater1_ctx = 1;
int num_last_subset;
int x_cg_last_sig, y_cg_last_sig;
const uint8_t *scan_x_cg, *scan_y_cg, *scan_x_off, *scan_y_off;
ptrdiff_t stride = s->frame->linesize[c_idx];
int hshift = s->sps->hshift[c_idx];
int vshift = s->sps->vshift[c_idx];
uint8_t *dst = &s->frame->data[c_idx][(y0 >> vshift) * stride +
((x0 >> hshift) << s->sps->pixel_shift)];
int16_t *coeffs = (int16_t*)(c_idx ? lc->edge_emu_buffer2 : lc->edge_emu_buffer);
uint8_t significant_coeff_group_flag[8][8] = {{0}};
int explicit_rdpcm_flag = 0;
int explicit_rdpcm_dir_flag;
int trafo_size = 1 << log2_trafo_size;
int i;
int qp,shift,add,scale,scale_m;
const uint8_t level_scale[] = { 40, 45, 51, 57, 64, 72 };
const uint8_t *scale_matrix = NULL;
uint8_t dc_scale;
int pred_mode_intra = (c_idx == 0) ? lc->tu.intra_pred_mode :
lc->tu.intra_pred_mode_c;
memset(coeffs, 0, trafo_size * trafo_size * sizeof(int16_t));
// Derive QP for dequant
if (!lc->cu.cu_transquant_bypass_flag) {
static const int qp_c[] = { 29, 30, 31, 32, 33, 33, 34, 34, 35, 35, 36, 36, 37, 37 };
static const uint8_t rem6[51 + 4 * 6 + 1] = {
0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2,
3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5,
0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3,
4, 5, 0, 1, 2, 3, 4, 5, 0, 1
};
static const uint8_t div6[51 + 4 * 6 + 1] = {
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10,
10, 10, 11, 11, 11, 11, 11, 11, 12, 12
};
int qp_y = lc->qp_y;
if (s->pps->transform_skip_enabled_flag &&
log2_trafo_size <= s->pps->log2_max_transform_skip_block_size) {
transform_skip_flag = ff_hevc_transform_skip_flag_decode(s, c_idx);
}
if (c_idx == 0) {
qp = qp_y + s->sps->qp_bd_offset;
} else {
int qp_i, offset;
if (c_idx == 1)
offset = s->pps->cb_qp_offset + s->sh.slice_cb_qp_offset +
lc->tu.cu_qp_offset_cb;
else
offset = s->pps->cr_qp_offset + s->sh.slice_cr_qp_offset +
lc->tu.cu_qp_offset_cr;
qp_i = av_clip(qp_y + offset, - s->sps->qp_bd_offset, 57);
if (s->sps->chroma_format_idc == 1) {
if (qp_i < 30)
qp = qp_i;
else if (qp_i > 43)
qp = qp_i - 6;
else
qp = qp_c[qp_i - 30];
} else {
if (qp_i > 51)
qp = 51;
else
qp = qp_i;
}
qp += s->sps->qp_bd_offset;
}
shift = s->sps->bit_depth + log2_trafo_size - 5;
add = 1 << (shift-1);
scale = level_scale[rem6[qp]] << (div6[qp]);
scale_m = 16; // default when no custom scaling lists.
dc_scale = 16;
if (s->sps->scaling_list_enable_flag && !(transform_skip_flag && log2_trafo_size > 2)) {
const ScalingList *sl = s->pps->scaling_list_data_present_flag ?
&s->pps->scaling_list : &s->sps->scaling_list;
int matrix_id = lc->cu.pred_mode != MODE_INTRA;
matrix_id = 3 * matrix_id + c_idx;
scale_matrix = sl->sl[log2_trafo_size - 2][matrix_id];
if (log2_trafo_size >= 4)
dc_scale = sl->sl_dc[log2_trafo_size - 4][matrix_id];
}
} else {
shift = 0;
add = 0;
scale = 0;
dc_scale = 0;
}
if (lc->cu.pred_mode == MODE_INTER && s->sps->explicit_rdpcm_enabled_flag &&
(transform_skip_flag || lc->cu.cu_transquant_bypass_flag)) {
explicit_rdpcm_flag = explicit_rdpcm_flag_decode(s, c_idx);
if (explicit_rdpcm_flag) {
explicit_rdpcm_dir_flag = explicit_rdpcm_dir_flag_decode(s, c_idx);
}
}
last_significant_coeff_xy_prefix_decode(s, c_idx, log2_trafo_size,
&last_significant_coeff_x, &last_significant_coeff_y);
if (last_significant_coeff_x > 3) {
int suffix = last_significant_coeff_suffix_decode(s, last_significant_coeff_x);
last_significant_coeff_x = (1 << ((last_significant_coeff_x >> 1) - 1)) *
(2 + (last_significant_coeff_x & 1)) +
suffix;
}
if (last_significant_coeff_y > 3) {
int suffix = last_significant_coeff_suffix_decode(s, last_significant_coeff_y);
last_significant_coeff_y = (1 << ((last_significant_coeff_y >> 1) - 1)) *
(2 + (last_significant_coeff_y & 1)) +
suffix;
}
if (scan_idx == SCAN_VERT)
FFSWAP(int, last_significant_coeff_x, last_significant_coeff_y);
x_cg_last_sig = last_significant_coeff_x >> 2;
y_cg_last_sig = last_significant_coeff_y >> 2;
switch (scan_idx) {
case SCAN_DIAG: {
int last_x_c = last_significant_coeff_x & 3;
int last_y_c = last_significant_coeff_y & 3;
scan_x_off = ff_hevc_diag_scan4x4_x;
scan_y_off = ff_hevc_diag_scan4x4_y;
num_coeff = diag_scan4x4_inv[last_y_c][last_x_c];
if (trafo_size == 4) {
scan_x_cg = scan_1x1;
scan_y_cg = scan_1x1;
} else if (trafo_size == 8) {
num_coeff += diag_scan2x2_inv[y_cg_last_sig][x_cg_last_sig] << 4;
scan_x_cg = diag_scan2x2_x;
scan_y_cg = diag_scan2x2_y;
} else if (trafo_size == 16) {
num_coeff += diag_scan4x4_inv[y_cg_last_sig][x_cg_last_sig] << 4;
scan_x_cg = ff_hevc_diag_scan4x4_x;
scan_y_cg = ff_hevc_diag_scan4x4_y;
} else { // trafo_size == 32
num_coeff += diag_scan8x8_inv[y_cg_last_sig][x_cg_last_sig] << 4;
scan_x_cg = ff_hevc_diag_scan8x8_x;
scan_y_cg = ff_hevc_diag_scan8x8_y;
}
break;
}
case SCAN_HORIZ:
scan_x_cg = horiz_scan2x2_x;
scan_y_cg = horiz_scan2x2_y;
scan_x_off = horiz_scan4x4_x;
scan_y_off = horiz_scan4x4_y;
num_coeff = horiz_scan8x8_inv[last_significant_coeff_y][last_significant_coeff_x];
break;
default: //SCAN_VERT
scan_x_cg = horiz_scan2x2_y;
scan_y_cg = horiz_scan2x2_x;
scan_x_off = horiz_scan4x4_y;
scan_y_off = horiz_scan4x4_x;
num_coeff = horiz_scan8x8_inv[last_significant_coeff_x][last_significant_coeff_y];
break;
}
num_coeff++;
num_last_subset = (num_coeff - 1) >> 4;
for (i = num_last_subset; i >= 0; i--) {
int n, m;
int x_cg, y_cg, x_c, y_c, pos;
int implicit_non_zero_coeff = 0;
int64_t trans_coeff_level;
int prev_sig = 0;
int offset = i << 4;
int rice_init = 0;
uint8_t significant_coeff_flag_idx[16];
uint8_t nb_significant_coeff_flag = 0;
x_cg = scan_x_cg[i];
y_cg = scan_y_cg[i];
if ((i < num_last_subset) && (i > 0)) {
int ctx_cg = 0;
if (x_cg < (1 << (log2_trafo_size - 2)) - 1)
ctx_cg += significant_coeff_group_flag[x_cg + 1][y_cg];
if (y_cg < (1 << (log2_trafo_size - 2)) - 1)
ctx_cg += significant_coeff_group_flag[x_cg][y_cg + 1];
significant_coeff_group_flag[x_cg][y_cg] =
significant_coeff_group_flag_decode(s, c_idx, ctx_cg);
implicit_non_zero_coeff = 1;
} else {
significant_coeff_group_flag[x_cg][y_cg] =
((x_cg == x_cg_last_sig && y_cg == y_cg_last_sig) ||
(x_cg == 0 && y_cg == 0));
}
last_scan_pos = num_coeff - offset - 1;
if (i == num_last_subset) {
n_end = last_scan_pos - 1;
significant_coeff_flag_idx[0] = last_scan_pos;
nb_significant_coeff_flag = 1;
} else {
n_end = 15;
}
if (x_cg < ((1 << log2_trafo_size) - 1) >> 2)
prev_sig = !!significant_coeff_group_flag[x_cg + 1][y_cg];
if (y_cg < ((1 << log2_trafo_size) - 1) >> 2)
prev_sig += (!!significant_coeff_group_flag[x_cg][y_cg + 1] << 1);
if (significant_coeff_group_flag[x_cg][y_cg] && n_end >= 0) {
static const uint8_t ctx_idx_map[] = {
0, 1, 4, 5, 2, 3, 4, 5, 6, 6, 8, 8, 7, 7, 8, 8, // log2_trafo_size == 2
1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, // prev_sig == 0
2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, // prev_sig == 1
2, 1, 0, 0, 2, 1, 0, 0, 2, 1, 0, 0, 2, 1, 0, 0, // prev_sig == 2
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 // default
};
const uint8_t *ctx_idx_map_p;
int scf_offset = 0;
if (s->sps->transform_skip_context_enabled_flag &&
(transform_skip_flag || lc->cu.cu_transquant_bypass_flag)) {
ctx_idx_map_p = (uint8_t*) &ctx_idx_map[4 * 16];
if (c_idx == 0) {
scf_offset = 40;
} else {
scf_offset = 14 + 27;
}
} else {
if (c_idx != 0)
scf_offset = 27;
if (log2_trafo_size == 2) {
ctx_idx_map_p = (uint8_t*) &ctx_idx_map[0];
} else {
ctx_idx_map_p = (uint8_t*) &ctx_idx_map[(prev_sig + 1) << 4];
if (c_idx == 0) {
if ((x_cg > 0 || y_cg > 0))
scf_offset += 3;
if (log2_trafo_size == 3) {
scf_offset += (scan_idx == SCAN_DIAG) ? 9 : 15;
} else {
scf_offset += 21;
}
} else {
if (log2_trafo_size == 3)
scf_offset += 9;
else
scf_offset += 12;
}
}
}
for (n = n_end; n > 0; n--) {
x_c = scan_x_off[n];
y_c = scan_y_off[n];
if (significant_coeff_flag_decode(s, x_c, y_c, scf_offset, ctx_idx_map_p)) {
significant_coeff_flag_idx[nb_significant_coeff_flag] = n;
nb_significant_coeff_flag++;
implicit_non_zero_coeff = 0;
}
}
if (implicit_non_zero_coeff == 0) {
if (s->sps->transform_skip_context_enabled_flag &&
(transform_skip_flag || lc->cu.cu_transquant_bypass_flag)) {
if (c_idx == 0) {
scf_offset = 42;
} else {
scf_offset = 16 + 27;
}
} else {
if (i == 0) {
if (c_idx == 0)
scf_offset = 0;
else
scf_offset = 27;
} else {
scf_offset = 2 + scf_offset;
}
}
if (significant_coeff_flag_decode_0(s, c_idx, scf_offset) == 1) {
significant_coeff_flag_idx[nb_significant_coeff_flag] = 0;
nb_significant_coeff_flag++;
}
} else {
significant_coeff_flag_idx[nb_significant_coeff_flag] = 0;
nb_significant_coeff_flag++;
}
}
n_end = nb_significant_coeff_flag;
if (n_end) {
int first_nz_pos_in_cg;
int last_nz_pos_in_cg;
int c_rice_param = 0;
int first_greater1_coeff_idx = -1;
uint8_t coeff_abs_level_greater1_flag[8];
uint16_t coeff_sign_flag;
int sum_abs = 0;
int sign_hidden;
int sb_type;
// initialize first elem of coeff_bas_level_greater1_flag
int ctx_set = (i > 0 && c_idx == 0) ? 2 : 0;
if (s->sps->persistent_rice_adaptation_enabled_flag) {
if (!transform_skip_flag && !lc->cu.cu_transquant_bypass_flag)
sb_type = 2 * (c_idx == 0 ? 1 : 0);
else
sb_type = 2 * (c_idx == 0 ? 1 : 0) + 1;
c_rice_param = lc->stat_coeff[sb_type] / 4;
}
if (!(i == num_last_subset) && greater1_ctx == 0)
ctx_set++;
greater1_ctx = 1;
last_nz_pos_in_cg = significant_coeff_flag_idx[0];
for (m = 0; m < (n_end > 8 ? 8 : n_end); m++) {
int inc = (ctx_set << 2) + greater1_ctx;
coeff_abs_level_greater1_flag[m] =
coeff_abs_level_greater1_flag_decode(s, c_idx, inc);
if (coeff_abs_level_greater1_flag[m]) {
greater1_ctx = 0;
if (first_greater1_coeff_idx == -1)
first_greater1_coeff_idx = m;
} else if (greater1_ctx > 0 && greater1_ctx < 3) {
greater1_ctx++;
}
}
first_nz_pos_in_cg = significant_coeff_flag_idx[n_end - 1];
if (lc->cu.cu_transquant_bypass_flag ||
(lc->cu.pred_mode == MODE_INTRA &&
s->sps->implicit_rdpcm_enabled_flag && transform_skip_flag &&
(pred_mode_intra == 10 || pred_mode_intra == 26 )) ||
explicit_rdpcm_flag)
sign_hidden = 0;
else
sign_hidden = (last_nz_pos_in_cg - first_nz_pos_in_cg >= 4);
if (first_greater1_coeff_idx != -1) {
coeff_abs_level_greater1_flag[first_greater1_coeff_idx] += coeff_abs_level_greater2_flag_decode(s, c_idx, ctx_set);
}
if (!s->pps->sign_data_hiding_flag || !sign_hidden ) {
coeff_sign_flag = coeff_sign_flag_decode(s, nb_significant_coeff_flag) << (16 - nb_significant_coeff_flag);
} else {
coeff_sign_flag = coeff_sign_flag_decode(s, nb_significant_coeff_flag - 1) << (16 - (nb_significant_coeff_flag - 1));
}
for (m = 0; m < n_end; m++) {
n = significant_coeff_flag_idx[m];
GET_COORD(offset, n);
if (m < 8) {
trans_coeff_level = 1 + coeff_abs_level_greater1_flag[m];
if (trans_coeff_level == ((m == first_greater1_coeff_idx) ? 3 : 2)) {
int last_coeff_abs_level_remaining = coeff_abs_level_remaining_decode(s, c_rice_param);
trans_coeff_level += last_coeff_abs_level_remaining;
if (trans_coeff_level > (3 << c_rice_param))
c_rice_param = s->sps->persistent_rice_adaptation_enabled_flag ? c_rice_param + 1 : FFMIN(c_rice_param + 1, 4);
if (s->sps->persistent_rice_adaptation_enabled_flag && !rice_init) {
int c_rice_p_init = lc->stat_coeff[sb_type] / 4;
if (last_coeff_abs_level_remaining >= (3 << c_rice_p_init))
lc->stat_coeff[sb_type]++;
else if (2 * last_coeff_abs_level_remaining < (1 << c_rice_p_init))
if (lc->stat_coeff[sb_type] > 0)
lc->stat_coeff[sb_type]--;
rice_init = 1;
}
}
} else {
int last_coeff_abs_level_remaining = coeff_abs_level_remaining_decode(s, c_rice_param);
trans_coeff_level = 1 + last_coeff_abs_level_remaining;
if (trans_coeff_level > (3 << c_rice_param))
c_rice_param = s->sps->persistent_rice_adaptation_enabled_flag ? c_rice_param + 1 : FFMIN(c_rice_param + 1, 4);
if (s->sps->persistent_rice_adaptation_enabled_flag && !rice_init) {
int c_rice_p_init = lc->stat_coeff[sb_type] / 4;
if (last_coeff_abs_level_remaining >= (3 << c_rice_p_init))
lc->stat_coeff[sb_type]++;
else if (2 * last_coeff_abs_level_remaining < (1 << c_rice_p_init))
if (lc->stat_coeff[sb_type] > 0)
lc->stat_coeff[sb_type]--;
rice_init = 1;
}
}
if (s->pps->sign_data_hiding_flag && sign_hidden) {
sum_abs += trans_coeff_level;
if (n == first_nz_pos_in_cg && (sum_abs&1))
trans_coeff_level = -trans_coeff_level;
}
if (coeff_sign_flag >> 15)
trans_coeff_level = -trans_coeff_level;
coeff_sign_flag <<= 1;
if(!lc->cu.cu_transquant_bypass_flag) {
if (s->sps->scaling_list_enable_flag && !(transform_skip_flag && log2_trafo_size > 2)) {
if(y_c || x_c || log2_trafo_size < 4) {
switch(log2_trafo_size) {
case 3: pos = (y_c << 3) + x_c; break;
case 4: pos = ((y_c >> 1) << 3) + (x_c >> 1); break;
case 5: pos = ((y_c >> 2) << 3) + (x_c >> 2); break;
default: pos = (y_c << 2) + x_c; break;
}
scale_m = scale_matrix[pos];
} else {
scale_m = dc_scale;
}
}
trans_coeff_level = (trans_coeff_level * (int64_t)scale * (int64_t)scale_m + add) >> shift;
if(trans_coeff_level < 0) {
if((~trans_coeff_level) & 0xFffffffffff8000)
trans_coeff_level = -32768;
} else {
if(trans_coeff_level & 0xffffffffffff8000)
trans_coeff_level = 32767;
}
}
coeffs[y_c * trafo_size + x_c] = trans_coeff_level;
}
}
}
if (lc->cu.cu_transquant_bypass_flag) {
if (explicit_rdpcm_flag || (s->sps->implicit_rdpcm_enabled_flag &&
(pred_mode_intra == 10 || pred_mode_intra == 26))) {
int mode = s->sps->implicit_rdpcm_enabled_flag ? (pred_mode_intra == 26) : explicit_rdpcm_dir_flag;
s->hevcdsp.transform_rdpcm(coeffs, log2_trafo_size, mode);
}
} else {
if (transform_skip_flag) {
int rot = s->sps->transform_skip_rotation_enabled_flag &&
log2_trafo_size == 2 &&
lc->cu.pred_mode == MODE_INTRA;
if (rot) {
for (i = 0; i < 8; i++)
FFSWAP(int16_t, coeffs[i], coeffs[16 - i - 1]);
}
s->hevcdsp.transform_skip(coeffs, log2_trafo_size);
if (explicit_rdpcm_flag || (s->sps->implicit_rdpcm_enabled_flag &&
lc->cu.pred_mode == MODE_INTRA &&
(pred_mode_intra == 10 || pred_mode_intra == 26))) {
int mode = explicit_rdpcm_flag ? explicit_rdpcm_dir_flag : (pred_mode_intra == 26);
s->hevcdsp.transform_rdpcm(coeffs, log2_trafo_size, mode);
}
} else if (lc->cu.pred_mode == MODE_INTRA && c_idx == 0 && log2_trafo_size == 2) {
// Here is 4x4DST
s->hevcdsp.idct_4x4_luma(coeffs);
} else {
int max_xy = FFMAX(last_significant_coeff_x, last_significant_coeff_y);
if (max_xy == 0)
s->hevcdsp.idct_dc[log2_trafo_size-2](coeffs);// Only right DC Coefficient do IDCT The faster algorithm
else {
int col_limit = last_significant_coeff_x + last_significant_coeff_y + 4;
if (max_xy < 4)
col_limit = FFMIN(4, col_limit);
else if (max_xy < 8)
col_limit = FFMIN(8, col_limit);
else if (max_xy < 12)
col_limit = FFMIN(24, col_limit);
s->hevcdsp.idct[log2_trafo_size-2](coeffs, col_limit);// ordinary IDCT
}
}
}
if (lc->tu.cross_pf) {
int16_t *coeffs_y = (int16_t*)lc->edge_emu_buffer;
for (i = 0; i < (trafo_size * trafo_size); i++) {
coeffs[i] = coeffs[i] + ((lc->tu.res_scale_val * coeffs_y[i]) >> 3);
}
}
// take IDCT The results are superimposed on the forecast data
s->hevcdsp.transform_add[log2_trafo_size-2](dst, coeffs, stride);
}

ff_hevc_hls_residual_coding() A large piece of code in the first half should be used to parse the residual data ( I haven't looked at it in detail yet ), The second half of the code is used to process the residual data DCT Transformation . stay DCT In reverse transformation , The following assembly functions are called :

HEVCDSPContext-> idct_4x4_luma():4x4DST Reverse transformation
HEVCDSPContext-> idct_dc[X](): Special ones only contain DC Coefficient DCT Reverse transformation
HEVCDSPContext-> idct[X](): ordinary DCT Reverse transformation
HEVCDSPContext-> transform_add [X](): Residual pixel data overlay

The difference is [X] The values represent coefficient blocks of different sizes :

[0] representative 4x4;
[1] representative 8x8;
[2] representative 16x16;
[3] representative 32x32;

The above assembly functions will be analyzed in detail later .

Intra prediction and DCT Inverse transformation knowledge

HEVC Intra prediction and DCT Inverse transformations are all based on TU In units , So put these two parts of knowledge together to record .

Intra prediction knowledge

HEVC The intra prediction of is shared by 35 Medium prediction model , As shown in the following table :

Mode number

Schema name

0

Planar

1

DC

2-34

33 It's an angle prediction model

Among them the first 2-34 The angle of the two prediction methods is as follows .

 

HEVC The angle of the predicted direction is relative to H.264 Increased to 33 Kind of . The advantage of this is that it can more effectively represent the texture features of the image , Improve prediction accuracy . Which number 2 To 17 The angle prediction model is horizontal model , The number is 18 To 34 The angle prediction model is vertical . The number is 10 It's level prediction , The number is 26 Bit vertical prediction mode of .
Planar The mode is calculated as shown in the figure below .

 

As you can see from the diagram ,Planar Mode first copies the value of the bottom pixel in the left column horizontally to a row , Copy the value of the rightmost pixel in the upper row vertically to a column ; And then use something like bilinear interpolation , Get the forecast data . This prediction method combines the characteristics of horizontal and vertical prediction .
DC The calculation method of the model is shown in the figure below .

 

As you can see from the diagram ,DC The principle of mode calculation is very simple : Directly average the pixels in the upper row and the left column of the current block , Assigned to every pixel in the current block .

DCT Transformation

H.264 Have adopted the 4x4 Integers DCT Transformation , stay HEVC This method of integer transformation is used in , But there are mainly the following differences :

(1) Change size is no longer limited to 4x4, It includes 4x4,8x8,16x16,32x32 Several ways .
(2) The transformation coefficient is much larger , This makes integers DCT The result is closer to floating point DCT Result . Note that after the transformation is completed, it will be multiplied by the correction matrix ( about 4x4 In other words , Unified times 1/128; For size N, The correction factor is 1/(64*sqrt(N))) Correct the enlarged result .
(3) stay Intra4x4 The brightness residual transform uses a special 4x4DST( Discrete sine transform , In the middle of the “S” representative “sin()”), This transformation will be recorded later .

HEVC The maximum support is 32x32 Of DCT Transformation . The coefficient values of the transformation matrix are shown in the figure below . The first one is on the left 16 Column values , The second picture is on the right 16 Column values .

 

4x4DCT The coefficients of the transformation come from the fact that 32x32 The second in the coefficient matrix 0,8,16,24 The first... In the line element 4 Elements , It's shown in a red box in the picture . Thus we can see that 4x4DCT The coefficient matrix is :

64  64  64  64
83  36 -36 -83
64 -64 -64  64
36 -83  83 -36

8x8DCT The coefficients of the transformation come from 32x32 The second in the coefficient matrix 0,4,8,12,16,20,24,28 The first... In the line element 8 Elements , It's shown in a yellow box . Thus we can see that 8x8DCT The coefficient matrix is :

 64  64  64  64  64  64  64  64
 89  75  50  18 -18 -50 -75 -89
 83  36 -36 -83 -83 -36  36  83
 75 -18 -89 -50  50  89  18 -75
 64 -64 -64  64  64 -64 -64  64
 50 -89  18  75 -75 -18  89 -50
 36 -83  83 -36 -36  83 -83  36
 18 -50  75 -89  89 -75  50 -18

16x16 DCT The coefficients of the transformation come from 32x32 The second in the coefficient matrix 0,2,4…,28,30 The first... In the line element 16 Elements , It's shown in a green box . Because of the large number of coefficients , It's no longer listed .

In coding Intra4x4 When we get the residual data , Using a rather special 4x4DST. The coefficient matrix of this transformation is as follows . Related experiments show that , In coding Intra4x4 When you use 4x4DST Can be promoted about 0.8% Coding efficiency of .

 29  55  74  84
 74  74   0 -74
 84 -29 -74  55
 55 -84  74 -29

Examples of intra prediction

This section takes a small video stream as an example , to glance at HEVC Information related to intra prediction in the bit stream .
【 Example 1】
Here is a picture of I Frame decoded image .

 

The picture below shows the frame CTU The way of division . You can see the complexity of the picture CTU The division is quite detailed .

 

The blue line below shows the direction of intra prediction .

 

The following figure shows the relationship between the intra prediction direction and the image content . It can be seen that the direction of intra prediction is basically the same as that of image texture .

 

The picture below shows the intra prediction , Video content without residual superposition processing .

 

The following figure shows the residual information of the frame .

 

【 Example 2】
Here is a picture of I Frame decoded image .

 

The picture below shows the frame CTU The way of division .

 

The blue line below shows the direction of intra prediction .

 

The following figure shows the relationship between the intra prediction direction and the image content .

 

The picture below shows the intra prediction , Video content without residual superposition processing .

 

The following figure shows the residual information of the frame .

 

【 Example 3- Intra filter information 】
This section begins with a paragraph 《Sintel》 Animation stream as an example , to glance at HEVC The intra filter in the bitstream contains specific information . The following figure for I Frame decoded image .

The following figure shows the result of intra prediction without superimposed residual data . Here we choose one 8x8 CU( The picture is marked with a purple box ) Take a look at the specific information . The CU Adopted 19 Intra prediction mode No ( It's an angle Angular Prediction model ).

 

The 8x8 CU The intra prediction information is shown in the figure below .

 

【 Example 4-DCT Example of inverse transformation 】
This section is based on 《Sintel》 Animation stream as an example , to glance at HEVC In the bitstream DCT Inverse transform specific information . The picture below is a decoded image .

 

The following figure shows the residual data of the frame image . Here we choose one 8x8 CU( The picture is marked with a purple box ) Take a look at the specific information .

 

The 8x8 CU Of DCT The inverse transformation information is shown in the figure below . The figure shows the inverse quantization , The concrete process of inverse transformation .

Intra prediction assembly function source code

The assembly function related to intra prediction is located in HEVCPredContext in .HEVCPredContext The initialization function for is ff_hevc_pred_init(). The function pair HEVCPredContext The function pointer in the structure is assigned .FFmpeg HEVC When the decoder is running, just call HEVCPredContext Function pointer can complete the corresponding function .

ff_hevc_pred_init()

ff_hevc_pred_init() For initialization HEVCPredContext Assembly function pointer in struct . The definition of this function is as follows .

// Initialization of intra prediction function
void ff_hevc_pred_init(HEVCPredContext *hpc, int bit_depth)
{
#undef FUNC
#define FUNC(a, depth) a ## _ ## depth
#define HEVC_PRED(depth) \
hpc->intra_pred[0] = FUNC(intra_pred_2, depth); \
hpc->intra_pred[1] = FUNC(intra_pred_3, depth); \
hpc->intra_pred[2] = FUNC(intra_pred_4, depth); \
hpc->intra_pred[3] = FUNC(intra_pred_5, depth); \
hpc->pred_planar[0] = FUNC(pred_planar_0, depth); \
hpc->pred_planar[1] = FUNC(pred_planar_1, depth); \
hpc->pred_planar[2] = FUNC(pred_planar_2, depth); \
hpc->pred_planar[3] = FUNC(pred_planar_3, depth); \
hpc->pred_dc = FUNC(pred_dc, depth); \
hpc->pred_angular[0] = FUNC(pred_angular_0, depth); \
hpc->pred_angular[1] = FUNC(pred_angular_1, depth); \
hpc->pred_angular[2] = FUNC(pred_angular_2, depth); \
hpc->pred_angular[3] = FUNC(pred_angular_3, depth);
switch (bit_depth) {
case 9:
HEVC_PRED(9);
break;
case 10:
HEVC_PRED(10);
break;
case 12:
HEVC_PRED(12);
break;
default:
HEVC_PRED(8);
break;
}
}

You can see from the source code ,ff_hevc_pred_init() Function contains a function called “HEVC_PRED(depth)” A very long macro definition of . The macro definition contains C Language version of the intra prediction function initialization code .ff_hevc_dsp_init() According to the color depth of the system bit_depth Initialize the corresponding C Language version of the intra prediction function . Let's say 8bit Take the color depth as an example , to glance at “HEVC_ PRED(8)” The result of the expansion .

hpc->intra_pred[0] = intra_pred_2_8;
hpc->intra_pred[1] = intra_pred_3_8;
hpc->intra_pred[2] = intra_pred_4_8;
hpc->intra_pred[3] = intra_pred_5_8;
hpc->pred_planar[0] = pred_planar_0_8;
hpc->pred_planar[1] = pred_planar_1_8;
hpc->pred_planar[2] = pred_planar_2_8;
hpc->pred_planar[3] = pred_planar_3_8;
hpc->pred_dc = pred_dc_8;
hpc->pred_angular[0] = pred_angular_0_8;
hpc->pred_angular[1] = pred_angular_1_8;
hpc->pred_angular[2] = pred_angular_2_8;
hpc->pred_angular[3] = pred_angular_3_8;

It can be seen that “HEVC_ PRED(8)” Initialization of the intra prediction module C Language version function .HEVCPredContext Is defined as follows .

typedef struct HEVCPredContext {
void (*intra_pred[4])(struct HEVCContext *s, int x0, int y0, int c_idx);
void (*pred_planar[4])(uint8_t *src, const uint8_t *top,
const uint8_t *left, ptrdiff_t stride);
void (*pred_dc)(uint8_t *src, const uint8_t *top, const uint8_t *left,
ptrdiff_t stride, int log2_size, int c_idx);
void (*pred_angular[4])(uint8_t *src, const uint8_t *top,
const uint8_t *left, ptrdiff_t stride,
int c_idx, int mode);
} HEVCPredContext;

You can see from the source code that ,HEVCPredContext Stored in 4 An assembly function pointer ( Array ):

intra_pred[4](): The entry function of intra prediction , The function is invoked in the execution process 3 Function pointers . Array 4 Two functions deal with 4x4,8x8,16x16,32x32 Several pieces .
pred_planar[4]():Planar Prediction model function . Array 4 Two functions deal with 4x4,8x8,16x16,32x32 Several pieces .
pred_dc():DC Prediction model function .
pred_angular[4](): Angle prediction model . Array 4 Two functions deal with 4x4,8x8,16x16,32x32 Several pieces .

These functions are described in the following order .

HEVCPredContext ->intra_pred[4]()

intra_pred[4]() It's the entry function of intra prediction , The function is called during the execution Planar、DC Or angle prediction function . Array 4 Each element is processed separately 4x4,8x8,16x16,32x32 Several pieces . The specific processing functions of these blocks are :

intra_pred_2_8()——4x4 block
intra_pred_3_8()——8x8 block
intra_pred_4_8()——16x16 block
intra_pred_5_8()——32x32 block
PS: When a function is named, the number in the middle is the side length of the block log2() The value after that .

The definitions of the above functions are as follows .

#define INTRA_PRED(size) \
static void FUNC(intra_pred_ ## size)(HEVCContext *s, int x0, int y0, int c_idx) \
{ \
FUNC(intra_pred)(s, x0, y0, size, c_idx); \
}
/* Intra prediction functions for several blocks of different sizes
* The parameter is the number of square pixels after logarithm
* for example “INTRA_PRED(2)” That is to say 4x4 The intra prediction function of the block
*
* “INTRA_PRED(2)” The expanded function is
* static void intra_pred_2_8(HEVCContext *s, int x0, int y0, int c_idx)
* {
* intra_pred_8(s, x0, y0, 2, c_idx);
* }
*/
INTRA_PRED(2)
INTRA_PRED(3)
INTRA_PRED(4)
INTRA_PRED(5)

You can see from the source code that ,intra_pred_2_8()、intra_pred_3_8() And so on are all through “INTRA_PRED()” Macro defines .intra_pred_2_8()、intra_pred_3_8() The same function is called inside the function of intra_pred_8(). The only difference between these functions is , call intra_pred_8() The first time 4 Parameters size The values are different .

intra_pred_8()
intra_pred_8() The preparation work such as filtering before intra prediction is completed , And according to the different types of intra prediction (Planar、DC、 angle ) Call different intra prediction functions . The definition of this function is as follows .

static av_always_inline void FUNC(intra_pred)(HEVCContext *s, int x0, int y0,
int log2_size, int c_idx)
{
#define PU(x) \
((x) >> s->sps->log2_min_pu_size)
#define MVF(x, y) \
(s->ref->tab_mvf[(x) + (y) * min_pu_width])
#define MVF_PU(x, y) \
MVF(PU(x0 + ((x) << hshift)), PU(y0 + ((y) << vshift)))
#define IS_INTRA(x, y) \
(MVF_PU(x, y).pred_flag == PF_INTRA)
#define MIN_TB_ADDR_ZS(x, y) \
s->pps->min_tb_addr_zs[(y) * (s->sps->tb_mask+2) + (x)]
#define EXTEND(ptr, val, len) \
do { \
pixel4 pix = PIXEL_SPLAT_X4(val); \
for (i = 0; i < (len); i += 4) \
AV_WN4P(ptr + i, pix); \
} while (0)
#define EXTEND_RIGHT_CIP(ptr, start, length) \
for (i = start; i < (start) + (length); i += 4) \
if (!IS_INTRA(i, -1)) \
AV_WN4P(&ptr[i], a); \
else \
a = PIXEL_SPLAT_X4(ptr[i+3])
#define EXTEND_LEFT_CIP(ptr, start, length) \
for (i = start; i > (start) - (length); i--) \
if (!IS_INTRA(i - 1, -1)) \
ptr[i - 1] = ptr[i]
#define EXTEND_UP_CIP(ptr, start, length) \
for (i = (start); i > (start) - (length); i -= 4) \
if (!IS_INTRA(-1, i - 3)) \
AV_WN4P(&ptr[i - 3], a); \
else \
a = PIXEL_SPLAT_X4(ptr[i - 3])
#define EXTEND_DOWN_CIP(ptr, start, length) \
for (i = start; i < (start) + (length); i += 4) \
if (!IS_INTRA(-1, i)) \
AV_WN4P(&ptr[i], a); \
else \
a = PIXEL_SPLAT_X4(ptr[i + 3])
HEVCLocalContext *lc = s->HEVClc;
int i;
int hshift = s->sps->hshift[c_idx];
int vshift = s->sps->vshift[c_idx];
int size = (1 << log2_size);
int size_in_luma_h = size << hshift;
int size_in_tbs_h = size_in_luma_h >> s->sps->log2_min_tb_size;
int size_in_luma_v = size << vshift;
int size_in_tbs_v = size_in_luma_v >> s->sps->log2_min_tb_size;
int x = x0 >> hshift;
int y = y0 >> vshift;
int x_tb = (x0 >> s->sps->log2_min_tb_size) & s->sps->tb_mask;
int y_tb = (y0 >> s->sps->log2_min_tb_size) & s->sps->tb_mask;
int cur_tb_addr = MIN_TB_ADDR_ZS(x_tb, y_tb);
// Be careful c_idx It marks the color component
ptrdiff_t stride = s->frame->linesize[c_idx] / sizeof(pixel);
pixel *src = (pixel*)s->frame->data[c_idx] + x + y * stride;
int min_pu_width = s->sps->min_pu_width;
enum IntraPredMode mode = c_idx ? lc->tu.intra_pred_mode_c :
lc->tu.intra_pred_mode;
pixel4 a;
pixel left_array[2 * MAX_TB_SIZE + 1];
pixel filtered_left_array[2 * MAX_TB_SIZE + 1];
pixel top_array[2 * MAX_TB_SIZE + 1];
pixel filtered_top_array[2 * MAX_TB_SIZE + 1];
pixel *left = left_array + 1;
pixel *top = top_array + 1;
pixel *filtered_left = filtered_left_array + 1;
pixel *filtered_top = filtered_top_array + 1;
int cand_bottom_left = lc->na.cand_bottom_left && cur_tb_addr > MIN_TB_ADDR_ZS( x_tb - 1, (y_tb + size_in_tbs_v) & s->sps->tb_mask);
int cand_left = lc->na.cand_left;
int cand_up_left = lc->na.cand_up_left;
int cand_up = lc->na.cand_up;
int cand_up_right = lc->na.cand_up_right && cur_tb_addr > MIN_TB_ADDR_ZS((x_tb + size_in_tbs_h) & s->sps->tb_mask, y_tb - 1);
int bottom_left_size = (FFMIN(y0 + 2 * size_in_luma_v, s->sps->height) -
(y0 + size_in_luma_v)) >> vshift;
int top_right_size = (FFMIN(x0 + 2 * size_in_luma_h, s->sps->width) -
(x0 + size_in_luma_h)) >> hshift;
if (s->pps->constrained_intra_pred_flag == 1) {
int size_in_luma_pu_v = PU(size_in_luma_v);
int size_in_luma_pu_h = PU(size_in_luma_h);
int on_pu_edge_x = !(x0 & ((1 << s->sps->log2_min_pu_size) - 1));
int on_pu_edge_y = !(y0 & ((1 << s->sps->log2_min_pu_size) - 1));
if (!size_in_luma_pu_h)
size_in_luma_pu_h++;
if (cand_bottom_left == 1 && on_pu_edge_x) {
int x_left_pu = PU(x0 - 1);
int y_bottom_pu = PU(y0 + size_in_luma_v);
int max = FFMIN(size_in_luma_pu_v, s->sps->min_pu_height - y_bottom_pu);
cand_bottom_left = 0;
for (i = 0; i < max; i += 2)
cand_bottom_left |= (MVF(x_left_pu, y_bottom_pu + i).pred_flag == PF_INTRA);
}
if (cand_left == 1 && on_pu_edge_x) {
int x_left_pu = PU(x0 - 1);
int y_left_pu = PU(y0);
int max = FFMIN(size_in_luma_pu_v, s->sps->min_pu_height - y_left_pu);
cand_left = 0;
for (i = 0; i < max; i += 2)
cand_left |= (MVF(x_left_pu, y_left_pu + i).pred_flag == PF_INTRA);
}
if (cand_up_left == 1) {
int x_left_pu = PU(x0 - 1);
int y_top_pu = PU(y0 - 1);
cand_up_left = MVF(x_left_pu, y_top_pu).pred_flag == PF_INTRA;
}
if (cand_up == 1 && on_pu_edge_y) {
int x_top_pu = PU(x0);
int y_top_pu = PU(y0 - 1);
int max = FFMIN(size_in_luma_pu_h, s->sps->min_pu_width - x_top_pu);
cand_up = 0;
for (i = 0; i < max; i += 2)
cand_up |= (MVF(x_top_pu + i, y_top_pu).pred_flag == PF_INTRA);
}
if (cand_up_right == 1 && on_pu_edge_y) {
int y_top_pu = PU(y0 - 1);
int x_right_pu = PU(x0 + size_in_luma_h);
int max = FFMIN(size_in_luma_pu_h, s->sps->min_pu_width - x_right_pu);
cand_up_right = 0;
for (i = 0; i < max; i += 2)
cand_up_right |= (MVF(x_right_pu + i, y_top_pu).pred_flag == PF_INTRA);
}
memset(left, 128, 2 * MAX_TB_SIZE*sizeof(pixel));
memset(top , 128, 2 * MAX_TB_SIZE*sizeof(pixel));
top[-1] = 128;
}
if (cand_up_left) {
left[-1] = POS(-1, -1);
top[-1] = left[-1];
}
if (cand_up)
memcpy(top, src - stride, size * sizeof(pixel));
if (cand_up_right) {
memcpy(top + size, src - stride + size, size * sizeof(pixel));
EXTEND(top + size + top_right_size, POS(size + top_right_size - 1, -1),
size - top_right_size);
}
if (cand_left)
for (i = 0; i < size; i++)
left[i] = POS(-1, i);
if (cand_bottom_left) {
for (i = size; i < size + bottom_left_size; i++)
left[i] = POS(-1, i);
EXTEND(left + size + bottom_left_size, POS(-1, size + bottom_left_size - 1),
size - bottom_left_size);
}
if (s->pps->constrained_intra_pred_flag == 1) {
if (cand_bottom_left || cand_left || cand_up_left || cand_up || cand_up_right) {
int size_max_x = x0 + ((2 * size) << hshift) < s->sps->width ?
2 * size : (s->sps->width - x0) >> hshift;
int size_max_y = y0 + ((2 * size) << vshift) < s->sps->height ?
2 * size : (s->sps->height - y0) >> vshift;
int j = size + (cand_bottom_left? bottom_left_size: 0) -1;
if (!cand_up_right) {
size_max_x = x0 + ((size) << hshift) < s->sps->width ?
size : (s->sps->width - x0) >> hshift;
}
if (!cand_bottom_left) {
size_max_y = y0 + (( size) << vshift) < s->sps->height ?
size : (s->sps->height - y0) >> vshift;
}
if (cand_bottom_left || cand_left || cand_up_left) {
while (j > -1 && !IS_INTRA(-1, j))
j--;
if (!IS_INTRA(-1, j)) {
j = 0;
while (j < size_max_x && !IS_INTRA(j, -1))
j++;
EXTEND_LEFT_CIP(top, j, j + 1);
left[-1] = top[-1];
}
} else {
j = 0;
while (j < size_max_x && !IS_INTRA(j, -1))
j++;
if (j > 0)
if (x0 > 0) {
EXTEND_LEFT_CIP(top, j, j + 1);
} else {
EXTEND_LEFT_CIP(top, j, j);
top[-1] = top[0];
}
left[-1] = top[-1];
}
left[-1] = top[-1];
if (cand_bottom_left || cand_left) {
a = PIXEL_SPLAT_X4(left[-1]);
EXTEND_DOWN_CIP(left, 0, size_max_y);
}
if (!cand_left)
EXTEND(left, left[-1], size);
if (!cand_bottom_left)
EXTEND(left + size, left[size - 1], size);
if (x0 != 0 && y0 != 0) {
a = PIXEL_SPLAT_X4(left[size_max_y - 1]);
EXTEND_UP_CIP(left, size_max_y - 1, size_max_y);
if (!IS_INTRA(-1, - 1))
left[-1] = left[0];
} else if (x0 == 0) {
EXTEND(left, 0, size_max_y);
} else {
a = PIXEL_SPLAT_X4(left[size_max_y - 1]);
EXTEND_UP_CIP(left, size_max_y - 1, size_max_y);
}
top[-1] = left[-1];
if (y0 != 0) {
a = PIXEL_SPLAT_X4(left[-1]);
EXTEND_RIGHT_CIP(top, 0, size_max_x);
}
}
}
// Infer the unavailable samples
if (!cand_bottom_left) {
if (cand_left) {
EXTEND(left + size, left[size - 1], size);
} else if (cand_up_left) {
EXTEND(left, left[-1], 2 * size);
cand_left = 1;
} else if (cand_up) {
left[-1] = top[0];
EXTEND(left, left[-1], 2 * size);
cand_up_left = 1;
cand_left = 1;
} else if (cand_up_right) {
EXTEND(top, top[size], size);
left[-1] = top[size];
EXTEND(left, left[-1], 2 * size);
cand_up = 1;
cand_up_left = 1;
cand_left = 1;
} else { // No samples available
left[-1] = (1 << (BIT_DEPTH - 1));
EXTEND(top, left[-1], 2 * size);
EXTEND(left, left[-1], 2 * size);
}
}
if (!cand_left)
EXTEND(left, left[size], size);
if (!cand_up_left) {
left[-1] = left[0];
}
if (!cand_up)
EXTEND(top, left[-1], size);
if (!cand_up_right)
EXTEND(top + size, top[size - 1], size);
top[-1] = left[-1];
// Filtering process
// wave filtering
if (!s->sps->intra_smoothing_disabled_flag && (c_idx == 0 || s->sps->chroma_format_idc == 3)) {
if (mode != INTRA_DC && size != 4){
int intra_hor_ver_dist_thresh[] = { 7, 1, 0 };
int min_dist_vert_hor = FFMIN(FFABS((int)(mode - 26U)),
FFABS((int)(mode - 10U)));
if (min_dist_vert_hor > intra_hor_ver_dist_thresh[log2_size - 3]) {
int threshold = 1 << (BIT_DEPTH - 5);
if (s->sps->sps_strong_intra_smoothing_enable_flag && c_idx == 0 &&
log2_size == 5 &&
FFABS(top[-1] + top[63] - 2 * top[31]) < threshold &&
FFABS(left[-1] + left[63] - 2 * left[31]) < threshold) {
// We can't just overwrite values in top because it could be
// a pointer into src
filtered_top[-1] = top[-1];
filtered_top[63] = top[63];
for (i = 0; i < 63; i++)
filtered_top[i] = ((64 - (i + 1)) * top[-1] +
(i + 1) * top[63] + 32) >> 6;
for (i = 0; i < 63; i++)
left[i] = ((64 - (i + 1)) * left[-1] +
(i + 1) * left[63] + 32) >> 6;
top = filtered_top;
} else {
filtered_left[2 * size - 1] = left[2 * size - 1];
filtered_top[2 * size - 1] = top[2 * size - 1];
for (i = 2 * size - 2; i >= 0; i--)
filtered_left[i] = (left[i + 1] + 2 * left[i] +
left[i - 1] + 2) >> 2;
filtered_top[-1] =
filtered_left[-1] = (left[0] + 2 * left[-1] + top[0] + 2) >> 2;
for (i = 2 * size - 2; i >= 0; i--)
filtered_top[i] = (top[i + 1] + 2 * top[i] +
top[i - 1] + 2) >> 2;
left = filtered_left;
top = filtered_top;
}
}
}
}
/*
* According to different intra prediction modes , Call different handler functions
* pred_planar[4],pred_angular[4] Medium “[4]” Represents several different sizes of squares
* [0]:4x4 block
* [1]:8x8 block
* [2]:16x16 block
* [3]:32x32 block
*
* log2size Take the logarithm of the side length of the square .
* 4x4 block ,log2size=log2(4)=2
* 8x8 block ,log2size=log2(8)=3
* 16x16 block ,log2size=log2(16)=4
* 32x32 block ,log2size=log2(32)=5
*
*/
switch (mode) {
case INTRA_PLANAR:
s->hpc.pred_planar[log2_size - 2]((uint8_t *)src, (uint8_t *)top,
(uint8_t *)left, stride);
break;
case INTRA_DC:
s->hpc.pred_dc((uint8_t *)src, (uint8_t *)top,
(uint8_t *)left, stride, log2_size, c_idx);
break;
default:
s->hpc.pred_angular[log2_size - 2]((uint8_t *)src, (uint8_t *)top,
(uint8_t *)left, stride, c_idx,
mode);
break;
}
}

intra_pred_8() I haven't looked at the previous part of the code , Some preparations for intra prediction are made ; There's a... At the back of it switch() sentence , According to the different intra prediction mode, different processing is done :

(1)Planar Pattern , call HEVCContext-> pred_planar()
(2)DC Pattern , call HEVCContext-> pred_dc()
(3) Other modes ( The rest is angle mode ), call HEVCContext-> pred_angular()

HEVC The definition of intra prediction mode in decoder is in IntraPredMode Variable , As shown below .

enum IntraPredMode {
INTRA_PLANAR = 0,
INTRA_DC,
INTRA_ANGULAR_2,
INTRA_ANGULAR_3,
INTRA_ANGULAR_4,
INTRA_ANGULAR_5,
INTRA_ANGULAR_6,
INTRA_ANGULAR_7,
INTRA_ANGULAR_8,
INTRA_ANGULAR_9,
INTRA_ANGULAR_10,
INTRA_ANGULAR_11,
INTRA_ANGULAR_12,
INTRA_ANGULAR_13,
INTRA_ANGULAR_14,
INTRA_ANGULAR_15,
INTRA_ANGULAR_16,
INTRA_ANGULAR_17,
INTRA_ANGULAR_18,
INTRA_ANGULAR_19,
INTRA_ANGULAR_20,
INTRA_ANGULAR_21,
INTRA_ANGULAR_22,
INTRA_ANGULAR_23,
INTRA_ANGULAR_24,
INTRA_ANGULAR_25,
INTRA_ANGULAR_26,
INTRA_ANGULAR_27,
INTRA_ANGULAR_28,
INTRA_ANGULAR_29,
INTRA_ANGULAR_30,
INTRA_ANGULAR_31,
INTRA_ANGULAR_32,
INTRA_ANGULAR_33,
INTRA_ANGULAR_34,
};

Let's take a look at 3 An intra prediction function .

HEVCPredContext -> pred_planar[4]()

HEVCPredContext -> pred_planar[4]() Points to intra prediction Planar Assembly functions for patterns . Array 4 Each element is processed separately 4x4,8x8,16x16,32x32 Several pieces . The details of these blocks C The language version processing function is : 

pred_planar_0_8()——4x4 block ;
pred_planar_1_8()——8x8 block ;
pred_planar_2_8()——16x16 block ;
pred_planar_3_8()——32x32 block ;

These four functions are defined as follows .

#define PRED_PLANAR(size)\
static void FUNC(pred_planar_ ## size)(uint8_t *src, const uint8_t *top, \
const uint8_t *left, ptrdiff_t stride) \
{ \
FUNC(pred_planar)(src, top, left, stride, size + 2); \
}
/* Several different sizes of squares correspond to Planar Prediction function
* The larger the parameter value is , The bigger the square you represent :
* [0]:4x4 block
* [1]:8x8 block
* [2]:16x16 block
* [3]:32x32 block
*
* “PRED_PLANAR(0)” The expanded function is
* static void pred_planar_0_8(uint8_t *src, const uint8_t *top,
* const uint8_t *left, ptrdiff_t stride)
* {
* pred_planar_8(src, top, left, stride, 0 + 2);
* }
*/
PRED_PLANAR(0)
PRED_PLANAR(1)
PRED_PLANAR(2)
PRED_PLANAR(3)

You can see from the source code that ,pred_planar_0_8()、pred_planar_1_8() And so on are all through “PRED_PLANAR ()” Macro defines .pred_planar_0_8()、pred_planar_1_8() The same function is called inside the function pred_planar_8(). The only difference between these functions is , call intra_pred_8() The first time 5 Parameters trafo_size The values are different .

pred_planar_8()

pred_planar_8() Realized Planar Intra prediction mode , The definition of this function is as follows .

#define POS(x, y) src[(x) + stride * (y)]
//Planar Prediction model
static av_always_inline void FUNC(pred_planar)(uint8_t *_src, const uint8_t *_top,
const uint8_t *_left, ptrdiff_t stride,
int trafo_size)
{
int x, y;
pixel *src = (pixel *)_src;
// above 1 Line pixel
const pixel *top = (const pixel *)_top;
// On the left 1 Column pixel
const pixel *left = (const pixel *)_left;
int size = 1 << trafo_size;
// Bilinear interpolation
// Be careful [size] For the last element
for (y = 0; y < size; y++)
for (x = 0; x < size; x++)
POS(x, y) = ((size - 1 - x) * left[y] + (x + 1) * top[size] +
(size - 1 - y) * top[x] + (y + 1) * left[size] + size) >> (trafo_size + 1);
}

You can see from the source code ,pred_planar_8() In a way similar to bilinear interpolation, it completes the filling of prediction data . among src Points to the pixel area of the square ,left Point to a column of pixels to the left of the square ,top Point to a row of pixels above the square .Planar The mode is calculated as shown in the figure below .

 

As you can see from the diagram ,Planar Mode first copies the value of the bottom pixel in the left column horizontally to a row , Copy the value of the rightmost pixel in the upper row vertically to a column ; And then use something like bilinear interpolation , Get the forecast data .

HEVCPredContext -> pred_dc ()

HEVCPredContext -> pred_dc() Points to intra prediction DC Assembly functions for patterns . Concrete C The language version of the handler is pred_dc_8().pred_dc_8() Is defined as follows .

#define POS(x, y) src[(x) + stride * (y)]
//DC Prediction model
static void FUNC(pred_dc)(uint8_t *_src, const uint8_t *_top,
const uint8_t *_left,
ptrdiff_t stride, int log2_size, int c_idx)
{
int i, j, x, y;
int size = (1 << log2_size);
pixel *src = (pixel *)_src;
const pixel *top = (const pixel *)_top;
const pixel *left = (const pixel *)_left;
int dc = size;
//pixel4 by unit32_t, It's stored 4 Pixel
pixel4 a;
// Add to the left 1 Column , And above 1 That's ok
for (i = 0; i < size; i++)
dc += left[i] + top[i];
// Averaging
dc >>= log2_size + 1;
// Take out the value
a = PIXEL_SPLAT_X4(dc);
// Assigned to each point in the pixel block
for (i = 0; i < size; i++)
for (j = 0; j < size; j+=4)
AV_WN4P(&POS(j, i), a);
if (c_idx == 0 && size < 32) {
POS(0, 0) = (left[0] + 2 * dc + top[0] + 2) >> 2;
for (x = 1; x < size; x++)
POS(x, 0) = (top[x] + 3 * dc + 2) >> 2;
for (y = 1; y < size; y++)
POS(0, y) = (left[y] + 3 * dc + 2) >> 2;
}
}

You can see from the source code ,pred_dc_8() First, the average values of the left row of pixels and the upper row of pixels are obtained , Then the value is assigned to the whole block as prediction data .

HEVCPredContext -> pred_angular ()

HEVCPredContext -> pred_angular[4]() Points to the intra prediction angle (Angular) Assembly functions for patterns . Array 4 Each element is processed separately 4x4,8x8,16x16,32x32 Several pieces . The details of these blocks C The language version processing function is : 

pred_angular_0_8()——4x4 block ;
pred_angular_1_8()——8x8 block ;
pred_angular_2_8()——16x16 block ;
pred_angular_3_8()——32x32 block ;

These four functions are defined as follows .

/* Several different sizes of squares correspond to Angular Prediction function
* The larger the number is , The bigger the square you represent :
* [0]:4x4 block
* [1]:8x8 block
* [2]:16x16 block
* [3]:32x32 block
*
*/
static void FUNC(pred_angular_0)(uint8_t *src, const uint8_t *top,
const uint8_t *left,
ptrdiff_t stride, int c_idx, int mode)
{
FUNC(pred_angular)(src, top, left, stride, c_idx, mode, 1 << 2);
}
static void FUNC(pred_angular_1)(uint8_t *src, const uint8_t *top,
const uint8_t *left,
ptrdiff_t stride, int c_idx, int mode)
{
FUNC(pred_angular)(src, top, left, stride, c_idx, mode, 1 << 3);
}
static void FUNC(pred_angular_2)(uint8_t *src, const uint8_t *top,
const uint8_t *left,
ptrdiff_t stride, int c_idx, int mode)
{
FUNC(pred_angular)(src, top, left, stride, c_idx, mode, 1 << 4);
}
static void FUNC(pred_angular_3)(uint8_t *src, const uint8_t *top,
const uint8_t *left,
ptrdiff_t stride, int c_idx, int mode)
{
FUNC(pred_angular)(src, top, left, stride, c_idx, mode, 1 << 5);
}

You can see from the source code ,pred_angular_0_8()、pred_angular_1_8() The same function is called inside the function pred_angular_8(). The difference between them is that they pass on to pred_angular_8() Last parameter of size Different values .

pred_angular_8()
pred_planar_8() It achieves the angle (Angular) Intra prediction mode , The definition of this function is as follows .

#define POS(x, y) src[(x) + stride * (y)]
static av_always_inline void FUNC(pred_angular)(uint8_t *_src,
const uint8_t *_top,
const uint8_t *_left,
ptrdiff_t stride, int c_idx,
int mode, int size)
{
int x, y;
pixel *src = (pixel *)_src;
const pixel *top = (const pixel *)_top;
const pixel *left = (const pixel *)_left;
// angle
static const int intra_pred_angle[] = {
32, 26, 21, 17, 13, 9, 5, 2, 0, -2, -5, -9, -13, -17, -21, -26, -32,
-26, -21, -17, -13, -9, -5, -2, 0, 2, 5, 9, 13, 17, 21, 26, 32
};
static const int inv_angle[] = {
-4096, -1638, -910, -630, -482, -390, -315, -256, -315, -390, -482,
-630, -910, -1638, -4096
};
//mode The first two are Planar and DC, It's not angle prediction
int angle = intra_pred_angle[mode - 2];
pixel ref_array[3 * MAX_TB_SIZE + 4];
pixel *ref_tmp = ref_array + size;
const pixel *ref;
int last = (size * angle) >> 5;
if (mode >= 18) {
// Vertical class pattern
ref = top - 1;
if (angle < 0 && last < -1) {
for (x = 0; x <= size; x += 4)
AV_WN4P(&ref_tmp[x], AV_RN4P(&top[x - 1]));
for (x = last; x <= -1; x++)
ref_tmp[x] = left[-1 + ((x * inv_angle[mode - 11] + 128) >> 8)];
ref = ref_tmp;
}
for (y = 0; y < size; y++) {
int idx = ((y + 1) * angle) >> 5;
int fact = ((y + 1) * angle) & 31;
if (fact) {
for (x = 0; x < size; x += 4) {
POS(x , y) = ((32 - fact) * ref[x + idx + 1] +
fact * ref[x + idx + 2] + 16) >> 5;
POS(x + 1, y) = ((32 - fact) * ref[x + 1 + idx + 1] +
fact * ref[x + 1 + idx + 2] + 16) >> 5;
POS(x + 2, y) = ((32 - fact) * ref[x + 2 + idx + 1] +
fact * ref[x + 2 + idx + 2] + 16) >> 5;
POS(x + 3, y) = ((32 - fact) * ref[x + 3 + idx + 1] +
fact * ref[x + 3 + idx + 2] + 16) >> 5;
}
} else {
for (x = 0; x < size; x += 4)
AV_WN4P(&POS(x, y), AV_RN4P(&ref[x + idx + 1]));
}
}
if (mode == 26 && c_idx == 0 && size < 32) {
for (y = 0; y < size; y++)
POS(0, y) = av_clip_pixel(top[0] + ((left[y] - left[-1]) >> 1));
}
} else {
// Horizontal pattern
ref = left - 1;
if (angle < 0 && last < -1) {
for (x = 0; x <= size; x += 4)
AV_WN4P(&ref_tmp[x], AV_RN4P(&left[x - 1]));
for (x = last; x <= -1; x++)
ref_tmp[x] = top[-1 + ((x * inv_angle[mode - 11] + 128) >> 8)];
ref = ref_tmp;
}
for (x = 0; x < size; x++) {
int idx = ((x + 1) * angle) >> 5;
int fact = ((x + 1) * angle) & 31;
if (fact) {
for (y = 0; y < size; y++) {
POS(x, y) = ((32 - fact) * ref[y + idx + 1] +
fact * ref[y + idx + 2] + 16) >> 5;
}
} else {
for (y = 0; y < size; y++)
POS(x, y) = ref[y + idx + 1];
}
}
if (mode == 10 && c_idx == 0 && size < 32) {
for (x = 0; x < size; x += 4) {
POS(x, 0) = av_clip_pixel(left[0] + ((top[x ] - top[-1]) >> 1));
POS(x + 1, 0) = av_clip_pixel(left[0] + ((top[x + 1] - top[-1]) >> 1));
POS(x + 2, 0) = av_clip_pixel(left[0] + ((top[x + 2] - top[-1]) >> 1));
POS(x + 3, 0) = av_clip_pixel(left[0] + ((top[x + 3] - top[-1]) >> 1));
}
}
}
}

pred_planar_8() I haven't looked at the code yet , I'll do the analysis later .
So far, the source code of intra prediction is basically analyzed . The analysis will continue later DCT Inverse transform related source code .

DCT Disassembly assembly function source code

DCT Assembly functions related to inverse transformation are located in HEVCDSPContext in .HEVCDSPContext The initialization function for is ff_hevc_dsp_init(). The function pair HEVCDSPContext The function pointer in the structure is assigned .FFmpeg HEVC When the decoder is running, just call HEVCDSPContext Function pointer can complete the corresponding function .

ff_hevc_dsp_init()

ff_hevc_dsp_init() For initialization HEVCDSPContext Assembly function pointer in struct . The definition of this function is as follows .

void ff_hevc_dsp_init(HEVCDSPContext *hevcdsp, int bit_depth)
{
#undef FUNC
#define FUNC(a, depth) a ## _ ## depth
#undef PEL_FUNC
#define PEL_FUNC(dst1, idx1, idx2, a, depth) \
for(i = 0 ; i < 10 ; i++) \
{ \
hevcdsp->dst1[i][idx1][idx2] = a ## _ ## depth; \
}
#undef EPEL_FUNCS
#define EPEL_FUNCS(depth) \
PEL_FUNC(put_hevc_epel, 0, 0, put_hevc_pel_pixels, depth); \
PEL_FUNC(put_hevc_epel, 0, 1, put_hevc_epel_h, depth); \
PEL_FUNC(put_hevc_epel, 1, 0, put_hevc_epel_v, depth); \
PEL_FUNC(put_hevc_epel, 1, 1, put_hevc_epel_hv, depth)
#undef EPEL_UNI_FUNCS
#define EPEL_UNI_FUNCS(depth) \
PEL_FUNC(put_hevc_epel_uni, 0, 0, put_hevc_pel_uni_pixels, depth); \
PEL_FUNC(put_hevc_epel_uni, 0, 1, put_hevc_epel_uni_h, depth); \
PEL_FUNC(put_hevc_epel_uni, 1, 0, put_hevc_epel_uni_v, depth); \
PEL_FUNC(put_hevc_epel_uni, 1, 1, put_hevc_epel_uni_hv, depth); \
PEL_FUNC(put_hevc_epel_uni_w, 0, 0, put_hevc_pel_uni_w_pixels, depth); \
PEL_FUNC(put_hevc_epel_uni_w, 0, 1, put_hevc_epel_uni_w_h, depth); \
PEL_FUNC(put_hevc_epel_uni_w, 1, 0, put_hevc_epel_uni_w_v, depth); \
PEL_FUNC(put_hevc_epel_uni_w, 1, 1, put_hevc_epel_uni_w_hv, depth)
#undef EPEL_BI_FUNCS
#define EPEL_BI_FUNCS(depth) \
PEL_FUNC(put_hevc_epel_bi, 0, 0, put_hevc_pel_bi_pixels, depth); \
PEL_FUNC(put_hevc_epel_bi, 0, 1, put_hevc_epel_bi_h, depth); \
PEL_FUNC(put_hevc_epel_bi, 1, 0, put_hevc_epel_bi_v, depth); \
PEL_FUNC(put_hevc_epel_bi, 1, 1, put_hevc_epel_bi_hv, depth); \
PEL_FUNC(put_hevc_epel_bi_w, 0, 0, put_hevc_pel_bi_w_pixels, depth); \
PEL_FUNC(put_hevc_epel_bi_w, 0, 1, put_hevc_epel_bi_w_h, depth); \
PEL_FUNC(put_hevc_epel_bi_w, 1, 0, put_hevc_epel_bi_w_v, depth); \
PEL_FUNC(put_hevc_epel_bi_w, 1, 1, put_hevc_epel_bi_w_hv, depth)
#undef QPEL_FUNCS
#define QPEL_FUNCS(depth) \
PEL_FUNC(put_hevc_qpel, 0, 0, put_hevc_pel_pixels, depth); \
PEL_FUNC(put_hevc_qpel, 0, 1, put_hevc_qpel_h, depth); \
PEL_FUNC(put_hevc_qpel, 1, 0, put_hevc_qpel_v, depth); \
PEL_FUNC(put_hevc_qpel, 1, 1, put_hevc_qpel_hv, depth)
#undef QPEL_UNI_FUNCS
#define QPEL_UNI_FUNCS(depth) \
PEL_FUNC(put_hevc_qpel_uni, 0, 0, put_hevc_pel_uni_pixels, depth); \
PEL_FUNC(put_hevc_qpel_uni, 0, 1, put_hevc_qpel_uni_h, depth); \
PEL_FUNC(put_hevc_qpel_uni, 1, 0, put_hevc_qpel_uni_v, depth); \
PEL_FUNC(put_hevc_qpel_uni, 1, 1, put_hevc_qpel_uni_hv, depth); \
PEL_FUNC(put_hevc_qpel_uni_w, 0, 0, put_hevc_pel_uni_w_pixels, depth); \
PEL_FUNC(put_hevc_qpel_uni_w, 0, 1, put_hevc_qpel_uni_w_h, depth); \
PEL_FUNC(put_hevc_qpel_uni_w, 1, 0, put_hevc_qpel_uni_w_v, depth); \
PEL_FUNC(put_hevc_qpel_uni_w, 1, 1, put_hevc_qpel_uni_w_hv, depth)
#undef QPEL_BI_FUNCS
#define QPEL_BI_FUNCS(depth) \
PEL_FUNC(put_hevc_qpel_bi, 0, 0, put_hevc_pel_bi_pixels, depth); \
PEL_FUNC(put_hevc_qpel_bi, 0, 1, put_hevc_qpel_bi_h, depth); \
PEL_FUNC(put_hevc_qpel_bi, 1, 0, put_hevc_qpel_bi_v, depth); \
PEL_FUNC(put_hevc_qpel_bi, 1, 1, put_hevc_qpel_bi_hv, depth); \
PEL_FUNC(put_hevc_qpel_bi_w, 0, 0, put_hevc_pel_bi_w_pixels, depth); \
PEL_FUNC(put_hevc_qpel_bi_w, 0, 1, put_hevc_qpel_bi_w_h, depth); \
PEL_FUNC(put_hevc_qpel_bi_w, 1, 0, put_hevc_qpel_bi_w_v, depth); \
PEL_FUNC(put_hevc_qpel_bi_w, 1, 1, put_hevc_qpel_bi_w_hv, depth)
#define HEVC_DSP(depth) \
hevcdsp->put_pcm = FUNC(put_pcm, depth); \
hevcdsp->transform_add[0] = FUNC(transform_add4x4, depth); \
hevcdsp->transform_add[1] = FUNC(transform_add8x8, depth); \
hevcdsp->transform_add[2] = FUNC(transform_add16x16, depth); \
hevcdsp->transform_add[3] = FUNC(transform_add32x32, depth); \
hevcdsp->transform_skip = FUNC(transform_skip, depth); \
hevcdsp->transform_rdpcm = FUNC(transform_rdpcm, depth); \
hevcdsp->idct_4x4_luma = FUNC(transform_4x4_luma, depth); \
hevcdsp->idct[0] = FUNC(idct_4x4, depth); \
hevcdsp->idct[1] = FUNC(idct_8x8, depth); \
hevcdsp->idct[2] = FUNC(idct_16x16, depth); \
hevcdsp->idct[3] = FUNC(idct_32x32, depth); \
\
hevcdsp->idct_dc[0] = FUNC(idct_4x4_dc, depth); \
hevcdsp->idct_dc[1] = FUNC(idct_8x8_dc, depth); \
hevcdsp->idct_dc[2] = FUNC(idct_16x16_dc, depth); \
hevcdsp->idct_dc[3] = FUNC(idct_32x32_dc, depth); \
\
hevcdsp->sao_band_filter = FUNC(sao_band_filter_0, depth); \
hevcdsp->sao_edge_filter[0] = FUNC(sao_edge_filter_0, depth); \
hevcdsp->sao_edge_filter[1] = FUNC(sao_edge_filter_1, depth); \
\
QPEL_FUNCS(depth); \
QPEL_UNI_FUNCS(depth); \
QPEL_BI_FUNCS(depth); \
EPEL_FUNCS(depth); \
EPEL_UNI_FUNCS(depth); \
EPEL_BI_FUNCS(depth); \
\
hevcdsp->hevc_h_loop_filter_luma = FUNC(hevc_h_loop_filter_luma, depth); \
hevcdsp->hevc_v_loop_filter_luma = FUNC(hevc_v_loop_filter_luma, depth); \
hevcdsp->hevc_h_loop_filter_chroma = FUNC(hevc_h_loop_filter_chroma, depth); \
hevcdsp->hevc_v_loop_filter_chroma = FUNC(hevc_v_loop_filter_chroma, depth); \
hevcdsp->hevc_h_loop_filter_luma_c = FUNC(hevc_h_loop_filter_luma, depth); \
hevcdsp->hevc_v_loop_filter_luma_c = FUNC(hevc_v_loop_filter_luma, depth); \
hevcdsp->hevc_h_loop_filter_chroma_c = FUNC(hevc_h_loop_filter_chroma, depth); \
hevcdsp->hevc_v_loop_filter_chroma_c = FUNC(hevc_v_loop_filter_chroma, depth)
int i = 0;
switch (bit_depth) {
case 9:
HEVC_DSP(9);
break;
case 10:
HEVC_DSP(10);
break;
case 12:
HEVC_DSP(12);
break;
default:
HEVC_DSP(8);
break;
}
if (ARCH_X86)
ff_hevc_dsp_init_x86(hevcdsp, bit_depth);
}

You can see from the source code ,ff_hevc_dsp_init() Function contains a function called “HEVC_DSP(depth)” A very long macro definition of . The macro definition contains C Language version of the various functions of the initialization code .ff_hevc_dsp_init() According to the color depth of the system bit_depth Initialize the corresponding C Language version of the function . At the end of the function contains the initialization function of the assembly function : If the system is X86 Architecturally , It will call ff_hevc_dsp_init_x86() initialization X86 Platform through the assembly of optimized functions . Let's say 8bit Take the color depth as an example , to glance at “HEVC_DSP(8)” The results of the expansion are in the middle of DCT Related functions .

hevcdsp->transform_add[0] = transform_add4x4_8;
hevcdsp->transform_add[1] = transform_add8x8_8;
hevcdsp->transform_add[2] = transform_add16x16_8;
hevcdsp->transform_add[3] = transform_add32x32_8;
hevcdsp->transform_skip = transform_skip_8;
hevcdsp->transform_rdpcm = transform_rdpcm_8;
hevcdsp->idct_4x4_luma = transform_4x4_luma_8;
hevcdsp->idct[0] = idct_4x4_8;
hevcdsp->idct[1] = idct_8x8_8;
hevcdsp->idct[2] = idct_16x16_8;
hevcdsp->idct[3] = idct_32x32_8;
hevcdsp->idct_dc[0] = idct_4x4_dc_8;
hevcdsp->idct_dc[1] = idct_8x8_dc_8;
hevcdsp->idct_dc[2] = idct_16x16_dc_8;
hevcdsp->idct_dc[3] = idct_32x32_dc_8;
// A little …. 

Through the above code can be summarized as follows IDCT function ( Array ):

HEVCDSPContext -> idct[4]():DCT Inverse transformation function . Array 4 Two functions deal with 4x4,8x8,16x16,32x32 Several pieces .
HEVCDSPContext -> idct_dc[4]() : Only DC At the time of coefficient DCT Inverse transformation function ( It's faster than ordinary DCT faster ). Array 4 Two functions deal with 4x4,8x8,16x16,32x32 Several pieces .
HEVCDSPContext -> idct_4x4_luma(): special 4x4DST Inverse transformation function . Processing Intra4x4 Block time ,HEVC Using a rather special DST( instead of DCT), Can slightly improve the coding efficiency .
HEVCDSPContext -> transform_add[4](): Residual superposition function , Is used to IDCT After that, the residual pixel data is superimposed on the predicted pixel data . Array 4 Two functions deal with 4x4,8x8,16x16,32x32 Several pieces .

PS: There are several others IDCT Function has not been looked at yet , It's not listed .

Let's take a look at the above functions .

HEVCDSPContext -> idct[4]()

HEVCPredContext -> idct[4]() Yes DCT Assembly function of inverse transformation . Array 4 Each element is processed separately 4x4,8x8,16x16,32x32 Several pieces . The details of these blocks C The language version processing function is : 

idct_4x4_8()——4x4 block ;
idct_8x8_8()——8x8 block ;
idct_16x16_8()——16x16 block ;
idct_32x32_8()——32x32 block ;

These four functions are defined as follows .

#define SET(dst, x) (dst) = (x)
#define SCALE(dst, x) (dst) = av_clip_int16(((x) + add) >> shift)
#define ADD_AND_SCALE(dst, x) \
(dst) = av_clip_pixel((dst) + av_clip_int16(((x) + add) >> shift))
#define IDCT_VAR4(H) \
int limit2 = FFMIN(col_limit + 4, H)
#define IDCT_VAR8(H) \
int limit = FFMIN(col_limit, H); \
int limit2 = FFMIN(col_limit + 4, H)
#define IDCT_VAR16(H) IDCT_VAR8(H)
#define IDCT_VAR32(H) IDCT_VAR8(H)
// Among them “H” take 4,8,16,32
// You can piece together different functions
#define IDCT(H) \
static void FUNC(idct_##H ##x ##H )( \
int16_t *coeffs, int col_limit) { \
int i; \
int shift = 7; \
int add = 1 << (shift - 1); \
int16_t *src = coeffs; \
IDCT_VAR ##H(H); \
\
for (i = 0; i < H; i++) { \
TR_ ## H(src, src, H, H, SCALE, limit2); \
if (limit2 < H && i%4 == 0 && !!i) \
limit2 -= 4; \
src++; \
} \
\
shift = 20 - BIT_DEPTH; \
add = 1 << (shift - 1); \
for (i = 0; i < H; i++) { \
TR_ ## H(coeffs, coeffs, 1, 1, SCALE, limit); \
coeffs += H; \
} \
}
// Several different scales of IDCT
IDCT( 4)
IDCT( 8)
IDCT(16)
IDCT(32)

You can see from the source code ,idct_4x4_8()、idct_8x8_8() The definition of an equal function is through “IDCT()” Macro implementation of . and “IDCT(H)” Another macro is called in the macro “TR_ ## H()”.“TR_ ## H()” according to “H” Different values , You can call :

TR_4()—— be used for 4x4DCT
TR_8()—— be used for 8x8DCT
TR_16()—— be used for 16x16DCT
TR_32()—— be used for 32x32DCT

TR4()、TR8()、TR16()、TR32() Is defined as follows .

/*
* 4x4DCT
*
* | 64 64 64 64 |
* H = | 83 36 -36 -83 |
* | 64 -64 -64 64 |
* | 36 -83 83 -36 |
*
*/
#define TR_4(dst, src, dstep, sstep, assign, end) \
do { \
const int e0 = 64 * src[0 * sstep] + 64 * src[2 * sstep]; \
const int e1 = 64 * src[0 * sstep] - 64 * src[2 * sstep]; \
const int o0 = 83 * src[1 * sstep] + 36 * src[3 * sstep]; \
const int o1 = 36 * src[1 * sstep] - 83 * src[3 * sstep]; \
\
assign(dst[0 * dstep], e0 + o0); \
assign(dst[1 * dstep], e1 + o1); \
assign(dst[2 * dstep], e1 - o1); \
assign(dst[3 * dstep], e0 - o0); \
} while (0)
/*
* 8x8DCT
*
* transform[] Store 32x32DCT Transformation coefficient
* 8x8DCT The coefficients of the transformation come from 32x32 The second in the coefficient matrix 0,4,8,12,16,20,24,28 The first... In the line element 8 Elements
*
*/
#define TR_8(dst, src, dstep, sstep, assign, end) \
do { \
int i, j; \
int e_8[4]; \
int o_8[4] = { 0 }; \
for (i = 0; i < 4; i++) \
for (j = 1; j < end; j += 2) \
o_8[i] += transform[4 * j][i] * src[j * sstep]; \
TR_4(e_8, src, 1, 2 * sstep, SET, 4); \
\
for (i = 0; i < 4; i++) { \
assign(dst[i * dstep], e_8[i] + o_8[i]); \
assign(dst[(7 - i) * dstep], e_8[i] - o_8[i]); \
} \
} while (0)
/*
* 16x16DCT
* 16x16 DCT The coefficients of the transformation come from 32x32 The second in the coefficient matrix 0,2,4…,28,30 The first... In the line element 16 Elements
*
*/
#define TR_16(dst, src, dstep, sstep, assign, end) \
do { \
int i, j; \
int e_16[8]; \
int o_16[8] = { 0 }; \
for (i = 0; i < 8; i++) \
for (j = 1; j < end; j += 2) \
o_16[i] += transform[2 * j][i] * src[j * sstep]; \
TR_8(e_16, src, 1, 2 * sstep, SET, 8); \
\
for (i = 0; i < 8; i++) { \
assign(dst[i * dstep], e_16[i] + o_16[i]); \
assign(dst[(15 - i) * dstep], e_16[i] - o_16[i]); \
} \
} while (0)
/*
* 32x32DCT
*
*/
#define TR_32(dst, src, dstep, sstep, assign, end) \
do { \
int i, j; \
int e_32[16]; \
int o_32[16] = { 0 }; \
for (i = 0; i < 16; i++) \
for (j = 1; j < end; j += 2) \
o_32[i] += transform[j][i] * src[j * sstep]; \
TR_16(e_32, src, 1, 2 * sstep, SET, end/2); \
\
for (i = 0; i < 16; i++) { \
assign(dst[i * dstep], e_32[i] + o_32[i]); \
assign(dst[(31 - i) * dstep], e_32[i] - o_32[i]); \
} \
} while (0)

About this part of the source code has not yet looked at , I'll make it up later . from TR8()、TR16() And so on , Their DCT The coefficient comes from a transform[32][32] Array .

transform[32][32]
transform[32][32] Is defined as follows , It stores 32x32DCT The coefficient of . Using this coefficient matrix , It can also be deduced 16x16DCT、8x8DCT、4x4DCT The coefficient of .

//32x32DCT Transformation coefficient
static const int8_t transform[32][32] = {
{ 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64 },
{ 90, 90, 88, 85, 82, 78, 73, 67, 61, 54, 46, 38, 31, 22, 13, 4,
-4, -13, -22, -31, -38, -46, -54, -61, -67, -73, -78, -82, -85, -88, -90, -90 },
{ 90, 87, 80, 70, 57, 43, 25, 9, -9, -25, -43, -57, -70, -80, -87, -90,
-90, -87, -80, -70, -57, -43, -25, -9, 9, 25, 43, 57, 70, 80, 87, 90 },
{ 90, 82, 67, 46, 22, -4, -31, -54, -73, -85, -90, -88, -78, -61, -38, -13,
13, 38, 61, 78, 88, 90, 85, 73, 54, 31, 4, -22, -46, -67, -82, -90 },
{ 89, 75, 50, 18, -18, -50, -75, -89, -89, -75, -50, -18, 18, 50, 75, 89,
89, 75, 50, 18, -18, -50, -75, -89, -89, -75, -50, -18, 18, 50, 75, 89 },
{ 88, 67, 31, -13, -54, -82, -90, -78, -46, -4, 38, 73, 90, 85, 61, 22,
-22, -61, -85, -90, -73, -38, 4, 46, 78, 90, 82, 54, 13, -31, -67, -88 },
{ 87, 57, 9, -43, -80, -90, -70, -25, 25, 70, 90, 80, 43, -9, -57, -87,
-87, -57, -9, 43, 80, 90, 70, 25, -25, -70, -90, -80, -43, 9, 57, 87 },
{ 85, 46, -13, -67, -90, -73, -22, 38, 82, 88, 54, -4, -61, -90, -78, -31,
31, 78, 90, 61, 4, -54, -88, -82, -38, 22, 73, 90, 67, 13, -46, -85 },
{ 83, 36, -36, -83, -83, -36, 36, 83, 83, 36, -36, -83, -83, -36, 36, 83,
83, 36, -36, -83, -83, -36, 36, 83, 83, 36, -36, -83, -83, -36, 36, 83 },
{ 82, 22, -54, -90, -61, 13, 78, 85, 31, -46, -90, -67, 4, 73, 88, 38,
-38, -88, -73, -4, 67, 90, 46, -31, -85, -78, -13, 61, 90, 54, -22, -82 },
{ 80, 9, -70, -87, -25, 57, 90, 43, -43, -90, -57, 25, 87, 70, -9, -80,
-80, -9, 70, 87, 25, -57, -90, -43, 43, 90, 57, -25, -87, -70, 9, 80 },
{ 78, -4, -82, -73, 13, 85, 67, -22, -88, -61, 31, 90, 54, -38, -90, -46,
46, 90, 38, -54, -90, -31, 61, 88, 22, -67, -85, -13, 73, 82, 4, -78 },
{ 75, -18, -89, -50, 50, 89, 18, -75, -75, 18, 89, 50, -50, -89, -18, 75,
75, -18, -89, -50, 50, 89, 18, -75, -75, 18, 89, 50, -50, -89, -18, 75 },
{ 73, -31, -90, -22, 78, 67, -38, -90, -13, 82, 61, -46, -88, -4, 85, 54,
-54, -85, 4, 88, 46, -61, -82, 13, 90, 38, -67, -78, 22, 90, 31, -73 },
{ 70, -43, -87, 9, 90, 25, -80, -57, 57, 80, -25, -90, -9, 87, 43, -70,
-70, 43, 87, -9, -90, -25, 80, 57, -57, -80, 25, 90, 9, -87, -43, 70 },
{ 67, -54, -78, 38, 85, -22, -90, 4, 90, 13, -88, -31, 82, 46, -73, -61,
61, 73, -46, -82, 31, 88, -13, -90, -4, 90, 22, -85, -38, 78, 54, -67 },
{ 64, -64, -64, 64, 64, -64, -64, 64, 64, -64, -64, 64, 64, -64, -64, 64,
64, -64, -64, 64, 64, -64, -64, 64, 64, -64, -64, 64, 64, -64, -64, 64 },
{ 61, -73, -46, 82, 31, -88, -13, 90, -4, -90, 22, 85, -38, -78, 54, 67,
-67, -54, 78, 38, -85, -22, 90, 4, -90, 13, 88, -31, -82, 46, 73, -61 },
{ 57, -80, -25, 90, -9, -87, 43, 70, -70, -43, 87, 9, -90, 25, 80, -57,
-57, 80, 25, -90, 9, 87, -43, -70, 70, 43, -87, -9, 90, -25, -80, 57 },
{ 54, -85, -4, 88, -46, -61, 82, 13, -90, 38, 67, -78, -22, 90, -31, -73,
73, 31, -90, 22, 78, -67, -38, 90, -13, -82, 61, 46, -88, 4, 85, -54 },
{ 50, -89, 18, 75, -75, -18, 89, -50, -50, 89, -18, -75, 75, 18, -89, 50,
50, -89, 18, 75, -75, -18, 89, -50, -50, 89, -18, -75, 75, 18, -89, 50 },
{ 46, -90, 38, 54, -90, 31, 61, -88, 22, 67, -85, 13, 73, -82, 4, 78,
-78, -4, 82, -73, -13, 85, -67, -22, 88, -61, -31, 90, -54, -38, 90, -46 },
{ 43, -90, 57, 25, -87, 70, 9, -80, 80, -9, -70, 87, -25, -57, 90, -43,
-43, 90, -57, -25, 87, -70, -9, 80, -80, 9, 70, -87, 25, 57, -90, 43 },
{ 38, -88, 73, -4, -67, 90, -46, -31, 85, -78, 13, 61, -90, 54, 22, -82,
82, -22, -54, 90, -61, -13, 78, -85, 31, 46, -90, 67, 4, -73, 88, -38 },
{ 36, -83, 83, -36, -36, 83, -83, 36, 36, -83, 83, -36, -36, 83, -83, 36,
36, -83, 83, -36, -36, 83, -83, 36, 36, -83, 83, -36, -36, 83, -83, 36 },
{ 31, -78, 90, -61, 4, 54, -88, 82, -38, -22, 73, -90, 67, -13, -46, 85,
-85, 46, 13, -67, 90, -73, 22, 38, -82, 88, -54, -4, 61, -90, 78, -31 },
{ 25, -70, 90, -80, 43, 9, -57, 87, -87, 57, -9, -43, 80, -90, 70, -25,
-25, 70, -90, 80, -43, -9, 57, -87, 87, -57, 9, 43, -80, 90, -70, 25 },
{ 22, -61, 85, -90, 73, -38, -4, 46, -78, 90, -82, 54, -13, -31, 67, -88,
88, -67, 31, 13, -54, 82, -90, 78, -46, 4, 38, -73, 90, -85, 61, -22 },
{ 18, -50, 75, -89, 89, -75, 50, -18, -18, 50, -75, 89, -89, 75, -50, 18,
18, -50, 75, -89, 89, -75, 50, -18, -18, 50, -75, 89, -89, 75, -50, 18 },
{ 13, -38, 61, -78, 88, -90, 85, -73, 54, -31, 4, 22, -46, 67, -82, 90,
-90, 82, -67, 46, -22, -4, 31, -54, 73, -85, 90, -88, 78, -61, 38, -13 },
{ 9, -25, 43, -57, 70, -80, 87, -90, 90, -87, 80, -70, 57, -43, 25, -9,
-9, 25, -43, 57, -70, 80, -87, 90, -90, 87, -80, 70, -57, 43, -25, 9 },
{ 4, -13, 22, -31, 38, -46, 54, -61, 67, -73, 78, -82, 85, -88, 90, -90,
90, -90, 88, -85, 82, -78, 73, -67, 61, -54, 46, -38, 31, -22, 13, -4 },
};

HEVCDSPContext -> idct_dc[4]()

HEVCPredContext -> idct_dc[4]() There's only DC At the time of coefficient DCT Assembly function of inverse transformation . Only DC Coefficient DCT Inverse transformation is a special case , Use in this case idct_dc[4]() It's faster than idct[4]() Hurry up . Array 4 Each element is processed separately 4x4,8x8,16x16,32x32 Several pieces . The details of these blocks C The language version processing function is : 

idct_4x4_dc_8()——4x4 block ;
idct_8x8_dc_8()——8x8 block ;
idct_16x16_dc_8()——16x16 block ;
idct_32x32_dc_8()——32x32 block ;

These four functions are defined as follows .

#define IDCT_DC(H) \
static void FUNC(idct_##H ##x ##H ##_dc)( \
int16_t *coeffs) { \
int i, j; \
int shift = 14 - BIT_DEPTH; \
int add = 1 << (shift - 1); \
int coeff = (((coeffs[0] + 1) >> 1) + add) >> shift; \
\
for (j = 0; j < H; j++) { \
for (i = 0; i < H; i++) { \
coeffs[i+j*H] = coeff; \
} \
} \
}
// Contains only DC The coefficient is faster IDCT
IDCT_DC( 4)
IDCT_DC( 8)
IDCT_DC(16)
IDCT_DC(32)

It can be seen that idct_4x4_dc_8()、idct_8x8_dc_8() The initialization of functions such as “IDCT_DC()” Macro finished . It can be seen that “IDCT_DC()” First, through DC coefficient coeffs[0] Converted value coeff, And then coeff Assigned to each coefficient in the coefficient matrix .

HEVCDSPContext -> idct_4x4_luma()

HEVCDSPContext -> idct_4x4_luma() Point to processing Intra4x4 Of CU Of DST Reverse transformation . Compared to the common in video coding DCT Reverse transformation ,DST Inverse transformation is a special transformation .4x4DST Inversely transformed C The language version function is transform_4x4_luma_8(), Its definition is shown below .

#define SCALE(dst, x) (dst) = av_clip_int16(((x) + add) >> shift)
/*
* 4x4DST
*
* | 29 55 74 84 |
* H = | 74 74 0 -74 |
* | 84 -29 -74 55 |
* | 55 -84 74 -29 |
*
*/
#define TR_4x4_LUMA(dst, src, step, assign) \
do { \
int c0 = src[0 * step] + src[2 * step]; \
int c1 = src[2 * step] + src[3 * step]; \
int c2 = src[0 * step] - src[3 * step]; \
int c3 = 74 * src[1 * step]; \
\
assign(dst[2 * step], 74 * (src[0 * step] - \
src[2 * step] + \
src[3 * step])); \
assign(dst[0 * step], 29 * c0 + 55 * c1 + c3); \
assign(dst[1 * step], 55 * c2 - 29 * c1 + c3); \
assign(dst[3 * step], 55 * c0 + 29 * c2 - c3); \
} while (0)
//4x4DST
static void FUNC(transform_4x4_luma)(int16_t *coeffs)
{
int i;
int shift = 7;
int add = 1 << (shift - 1);
int16_t *src = coeffs;
for (i = 0; i < 4; i++) {
TR_4x4_LUMA(src, src, 4, SCALE);
src++;
}
shift = 20 - BIT_DEPTH;
add = 1 << (shift - 1);
for (i = 0; i < 4; i++) {
TR_4x4_LUMA(coeffs, coeffs, 1, SCALE);
coeffs += 4;
}
}
#undef TR_4x4_LUMA

You can see from the source code ,transform_4x4_luma_8() call TR_4x4_LUMA() It's done 4x4DST The job of .

HEVCDSPContext -> transform_add[4]()

HEVCDSPContext -> transform_add[4]() Assembly function pointing to superimposed residual data . These functions are used to superimpose the residual pixel data on the predicted pixel data , Forming the final decoded image data . Array 4 Each element is processed separately 4x4,8x8,16x16,32x32 Several pieces . The details of these blocks C The language version processing function is : 

transform_add4x4_8()——4x4 block ;
transform_add8x8_8()——8x8 block ;
transform_add16x16_8()——16x16 block ;
transform_add32x32_8()——32x32 block ;

These four functions are defined as follows .

// superposition 4x4 The residual data of the block
static void FUNC(transform_add4x4)(uint8_t *_dst, int16_t *coeffs,
ptrdiff_t stride)
{
// The last parameter is zero 4
FUNC(transquant_bypass)(_dst, coeffs, stride, 4);
}
// superposition 8x8 The residual data of the block
static void FUNC(transform_add8x8)(uint8_t *_dst, int16_t *coeffs,
ptrdiff_t stride)
{
// The last parameter is zero 8
FUNC(transquant_bypass)(_dst, coeffs, stride, 8);
}
// superposition 16x16 The residual data of the block
static void FUNC(transform_add16x16)(uint8_t *_dst, int16_t *coeffs,
ptrdiff_t stride)
{
// The last parameter is zero 16
FUNC(transquant_bypass)(_dst, coeffs, stride, 16);
}
// superposition 32x32 The residual data of the block
static void FUNC(transform_add32x32)(uint8_t *_dst, int16_t *coeffs,
ptrdiff_t stride)
{
// The last parameter is zero 32
FUNC(transquant_bypass)(_dst, coeffs, stride, 32);
}

You can see from the source code ,transform_add4x4_8()、transform_add8x8_8() The same function is called inside the function transquant_bypass_8(), The difference between them is that they pass on to transquant_bypass_8() Last parameter of size The value is different. .

transquant_bypass_8()
transquant_bypass_8() The work of superimposing residual pixel data is completed . The definition of this function is as follows .

// Stack residual data ,transquant_bypass_8()
static av_always_inline void FUNC(transquant_bypass)(uint8_t *_dst, int16_t *coeffs,
ptrdiff_t stride, int size)
{
int x, y;
pixel *dst = (pixel *)_dst;
stride /= sizeof(pixel);
// Stack each point one by one
for (y = 0; y < size; y++) {
for (x = 0; x < size; x++) {
dst[x] = av_clip_pixel(dst[x] + *coeffs);// superposition ,av_clip_pixel() For limiting . The processed data is always stored in dst
coeffs++;
}
dst += stride;
}
}

You can see from the source code that ,transquant_bypass_8() Add the residual data coeff In turn, it's superimposed on the forecast data dst above .

At this point about IDCT The source code of aspect is basically analyzed .

LeiXiaoHua
leixiaohua1020@126.com
http://blog.csdn.net/leixiaohua1020

FFmpeg Of HEVC Decoder source code simple analysis :CTU decode (CTU Decode) part -TU More articles about

  1. FFmpeg Of HEVC Decoder source code simple analysis :CTU decode (CTU Decode) part -PU

    ===================================================== HEVC Source code analysis article list : [ decode -libavcodec HEVC decoder ] FFmpe ...

  2. FFmpeg Of HEVC Decoder source code simple analysis : Loop filtering (Loop Filter)

    ===================================================== HEVC Source code analysis article list : [ decode -libavcodec HEVC decoder ] FFmpe ...

  3. FFmpeg Of HEVC Decoder source code simple analysis : Decoder backbone

    ===================================================== HEVC Source code analysis article list : [ decode -libavcodec HEVC decoder ] FFmpe ...

  4. FFmpeg Of HEVC Decoder source code simple analysis : Parser (Parser) part

    ===================================================== HEVC Source code analysis article list : [ decode -libavcodec HEVC decoder ] FFmpe ...

  5. FFmpeg Of HEVC Decoder source code simple analysis : summary

    ===================================================== HEVC Source code analysis article list : [ decode -libavcodec HEVC decoder ] FFmpe ...

  6. FFmpeg Of H.264 Decoder source code simple analysis : Loop filtering (Loop Filter) part

    ===================================================== H.264 Source code analysis article list : [ code - x264] x264 Simple analysis of source code : summary x26 ...

  7. FFmpeg Of H.264 Decoder source code simple analysis : Macroblock decoding (Decode) part - Inter macroblock (Inter)

    ===================================================== H.264 Source code analysis article list : [ code - x264] x264 Simple analysis of source code : summary x26 ...

  8. FFmpeg Of H.264 Decoder source code simple analysis : Entropy decoding (Entropy Decoding) part

    ===================================================== H.264 Source code analysis article list : [ code - x264] x264 Simple analysis of source code : summary x26 ...

  9. FFmpeg Of H.264 Decoder source code simple analysis : Decoder backbone

    ===================================================== H.264 Source code analysis article list : [ code - x264] x264 Simple analysis of source code : summary x26 ...

Random recommendation

  1. c++ in string use c Input and output

    about string Self contained function c_str() Back to const char* type , about scanf Function cannot be used , It can be used in the following ways string s; scanf("%s",&* ...

  2. mvc-1

  3. 【 turn 】Server Tomcat v7.0 Server at localhost was unable to start within 45 seconds. If

    Reprinted address :http://fanshuyao.iteye.com/blog/1695482 stay eclipse start-up tomcat Encountered a timeout when 45 Second question : Server Tomcat v7.0 Server ...

  4. Xcode6 in segue Cancel the original push And modal(deprecated)

    xcode6 after push and modal It's abandoned . It can only be used for ios8 Before . We can see when we pull the line . These two methods have been abandoned , We need to find the right way to replace , At this time we found that show and Present ...

  5. Codeforces335B - Palindrome( Section DP)

    The main idea of the topic A given length does not exceed 5*10^4 A string containing only lowercase letters , Ask you to find its palindrome subsequence , If there is one, the length is 100 The palindrome subsequence of , So as long as the output length is 100 palindrome subsequence , Otherwise, output its longest palindrome subsequence Answer key this ...

  6. Maze problem python Realization ( Touch the wall with your right hand )

    Hello everyone , I'm duckling sauce , The blog address is :http://www.cnblogs.com/xiaoyajiang This is the graduation course design of mathematical model in sophomore year , I chose to study the problem of blind people crossing the maze by myself . Of course, later I checked this question on the Internet ...

  7. struts2 0day Loophole

    describe Apache Struts2 Recently, a 0day Loophole , The vulnerability is being fixed CVE-2014-0050 and 2014-0094 Two security vulnerabilities were mishandled , It can lead to denial of service attack and malicious code execution respectively . Loophole ...

  8. QTP Generating random numbers + Letter

    The following functions implement random generation 17 digit ( Including letters and numbers ), There is still room for improvement , It can be modified according to specific requirements Dim targetstring ' Call the return function to the variable .Function Procedure returns a value by function name targets ...

  9. sesame HTTP:Scrapyd Installation

    Scrapyd Is one for deployment and operation Scrapy Project tools , With it , You can write the Scrapy The project is uploaded to the virtual machine and passed through API To control its operation . Since it is Scrapy Project deployment , Basically Linux host , So this ...

  10. In the 100 million level traffic scenario , Large scale architecture design and Implementation 【2】---storm piece

    Take on the previous blog : In the 100 million level traffic scenario , Design and implementation of large cache architecture Continue this blog : ****************** start: Next , We are going to explain the cache structure of the product details page , Cache preheating and solutions , Cache warm-up may cause the whole system to ...