Description
The question is about the endIndices calculation algorithm in BoxParser , the corresponding code is here: https://github.com/androidx/media/blob/release-1.5.0-alpha01/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/BoxParser.java#L723
The potential issue is several video frames may be dropped in video editing/transcoding use case.
Let me give you an example.
The test stream is a video with ctts box. It contains 2034 frames. I set a breakpoint at https://github.com/androidx/media/blob/release-1.5.0-alpha01/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/BoxParser.java#L729. And then I get below debug info.
editMediaTime = 143
editDuration = 3049002
so editMediaTime + editDuration = 3049145 (corresponding to line 726)
startIndices[0] = 0 (This is correct)
endIndices[0] = 2032
timestamps =
| index | 0 | 1 | 2 | 3 | 4 | 5 | ...... | 2031 | 2032 | 2033 |
| value | 1509 | 4508 | 3008 | 6008 | 7499 | 9008 | ...... | 3045972 | 3050471 | 3048971 |
Here "endIndices[0] = 2032" because the frame (index = 2032) is a P frame so that its timestamp is LARGER than the one of frame (B frame, index = 2033). But actually frame (index = 2033) should be kept since its timestamp is less than editMediaTime + editDuration.
As the comment in this file (https://github.com/androidx/media/blob/release-1.5.0-alpha01/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/BoxParser.java#L713), I understand the author knows "the result of this binary search might be slightly incorrect (due to out-of-order timestamps), the loop below that walks forward to find the next sync frame will result in a correct start index. The start index would also be correct if we walk backwards to the previous sync frame". This is OK for playback use case. But AndroidX Media added transformer module recently. For editing and transcoding use case, the accurate END index becomes important.
Do you think this is a potential issue and could you help to improve it? Thanks.