For Programmers: Free Programming Magazines  


Home > Archive > Compression > August 2005 > scene change detection









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author scene change detection
micromysore@gmail.com

2005-07-26, 4:59 pm

hello,

I was looking at the GOP structure of a mpeg2 video file.
At every scene change, a new GOP is started with I-frame.

What is the algorithm used to determine the scene change in mpeg2?

-micromysore

Billy Joe

2005-07-26, 10:00 pm

> hello,
>
> I was looking at the GOP structure of a mpeg2 video file.
> At every scene change, a new GOP is started with I-frame.
>
> What is the algorithm used to determine the scene change in
> mpeg2?
>
> -micromysore


Apparently there isn't one. The Hauppauge capture devices, for example,
which convert analogue to MPEG1 or 2, create a GOP every 15 MPEG2 frames
regardless of scene content.

While this appears counter intuitive, it also seems to work very well!

Variable length GOPs seem to be rare, from what limited examination I've
done.

BJ


Jim Leonard

2005-07-26, 10:00 pm

Billy Joe wrote:
> Apparently there isn't one. The Hauppauge capture devices, for example,
> which convert analogue to MPEG1 or 2, create a GOP every 15 MPEG2 frames
> regardless of scene content.
>
> Variable length GOPs seem to be rare, from what limited examination I've
> done.


Don't know what you're looking at, but I believe they're quite common.
Any MPEG-2 encoder worth their salt, hardware or software, does this.
CCE does this, as does the hardware MPEG-2 encoder in my ReplayTV 5000
unit. Nearly half of the DVDs I've examined have this as well. They
may not go BEYOND the typical 15-frame GOP, but they definitely kill
the current one and start a new one at a scene change.

As for an algorithm, it's extremely simple: Compare the current frame
to the previous frame; if there is more than N percent change in
luminance per pixel on average, start a new GOP sequence. In my
experience, N is around 50%. (In other words, if 50% of the picture
material changes between frames, that's a scene change, and start a new
GOP.)

Billy Joe

2005-07-26, 10:00 pm

> Billy Joe wrote:
>
> Don't know what you're looking at,


Sorry, I'll repeat: Hauppauge capture devices

> but I believe they're
> quite common. Any MPEG-2 encoder worth their salt, hardware
> or software, does this. CCE does this, as does the hardware
> MPEG-2 encoder in my ReplayTV 5000 unit. Nearly half of the
> DVDs I've examined have this as well. They may not go BEYOND
> the typical 15-frame GOP, but they definitely kill the
> current one and start a new one at a scene change.
>
> As for an algorithm, it's extremely simple: Compare the
> current frame to the previous frame; if there is more than N
> percent change in luminance per pixel on average, start a new
> GOP sequence. In my experience, N is around 50%.


Around 50% is an algorithm? ;-0) Perhaps you can point me (or anyone) to
some specific math on the subject?

And maybe the part of the MPEG2 spec that suggests Andy Warhol's "Empire"
could be done in one or two GOPs?

> (In other
> words, if 50% of the picture material changes between frames,
> that's a scene change, and start a new GOP.)



Jim Leonard

2005-07-26, 10:00 pm

Billy Joe wrote:
>
> Sorry, I'll repeat: Hauppauge capture devices


I meant that you should look at more than just one type of source.

>
> Around 50% is an algorithm? ;-0) Perhaps you can point me (or anyone) to
> some specific math on the subject?


google "scene change detection algorithm".

Billy Joe

2005-07-27, 3:59 am

> Billy Joe wrote:
>
> I meant that you should look at more than just one type of
> source.
>
>
> google "scene change detection algorithm".


The search returns 169 hits in English, only one when CCE is added to the
criteria. Sadly, that one is members only I3E. The bulk of the non-CCE
results refer to the same Sept 2000 paper, regarding analysis of MPEG2, and
are abstract only. Maybe you can quote a non-copyrighted summarization of
the technique they "proposed" as a help to the OP? We (you, I, and the OP)
may be on different waves?. I'm referring to capture of video. Most of the
repetitive info of what I browsed from the recommended search refers to
decompression. And maybe that's what the OP really wants?

I could see a DV (or motion JPEG) to MPEG2 conversion doing scene detection
(to a degree) keeping the GOP size within spec, but analogue to JPEG? Well,
at some price, I guess. Given those who could pay the price, why would
they. Digital analysis has to be cheaper.

Any other references?

BJ


Ico Doornekamp

2005-07-27, 3:59 am

>> As for an algorithm, it's extremely simple: Compare the
>
> Around 50% is an algorithm? ;-0) Perhaps you can point me (or anyone) to
> some specific math on the subject?


I don't know the code, but I recently found this one, developed by the BBC
and recently opensourced :

http://www.bbc.co.uk/opensource/projects/shot_change/

Description: A simple DirectShow video shot change detector filter, suitable
for a wide variety of applications. BBC R&D created the project.

_Ico


--
:wq
^X^Cy^K^X^C^C^C
erik

2005-07-27, 3:59 am

"Billy Joe" <see.id.line@invalid.org> wrote in message
news:XNqdne_ZbMAORnvfRVn-sA@adelphia.com...
>
> Sorry, I'll repeat: Hauppauge capture devices
>
>
> Around 50% is an algorithm? ;-0) Perhaps you can point me (or
> anyone) to some specific math on the subject?
>
> And maybe the part of the MPEG2 spec that suggests Andy Warhol's
> "Empire" could be done in one or two GOPs?
>


DVD compliant MPEG2 uses max. 15 (PAL) or 18 (NTSC) frames per GOP. If
this mysterious change of scene happens, new GOP is started, even
before 15/18 frames limit is reached. How the scene detection works -
my guess, there's a function that calculates a difference between 2
frames. If value reaches some value, change of the scene is declared.
The algorithm "around 50%" above might actually work quite well.


Jan Panteltje

2005-07-27, 8:59 am

On a sunny day (27 Jul 2005 06:56:24 GMT) it happened Ico Doornekamp
<ico@pruts.nl> wrote in <42e73018$0$11063$e4fe514c@news.xs4all.nl>:

>
>I don't know the code, but I recently found this one, developed by the BBC
>and recently opensourced :
>
>http://www.bbc.co.uk/opensource/projects/shot_change/
>
>Description: A simple DirectShow video shot change detector filter, suitable
>for a wide variety of applications. BBC R&D created the project.
>
>_Ico

In fact the statement 50% change is quite correct.
Many years ago I added motion detection to my webcam software, basically
you substract lumince and chrominance for the current and previous frame.
http://panteltje.com/mcam/
If enough change you have motion (here the level is just a sum value).
I use it to automatcally store those frames, it detects persons perfectly.
If enough difference between current and previous frame you have a scene change.
You can put that in percent: average scene 2x brighter, 2x darker......,
colors 2x brighter (but perhaps they do not even bother about colors).
Here is the program code (Copyright 1999 Jan Panteltje and released under GPL):

int detect_change(\
BYTE __huge *buffer, int in_size, int *change_flag,\
int y_difference, int u_difference, int v_difference, int treshhold)
{
int a, c, i, j;
int low_byte, high_byte;
int line_length;
BYTE __huge *ptr;
int line_cnt;
int y_val, u_val, v_val;
int y_width, u_width, v_width;
int y_replace, u_replace, v_replace;
FILE *fptr;
char temp[1024];
int y_idx, u_idx, v_idx;
int change_level;
int y_matrix[288][356];
int u_matrix[288][178];
int v_matrix[288][178];
int old_y_matrix[288][356];
int old_u_matrix[288][178];
int old_v_matrix[288][178];
int diff;

if(debug_flag)
{
fprintf(stdout,\
"detect_change(): buffer=%lu in_size=%d\n\
y_difference=%d u_difference=%d v_difference=%d treshhold=%d\n",\
(long)buffer, in_size,\
y_difference, u_difference, v_difference, treshhold);
}

/* set for no change in case error return */
*change_flag = 0;

/* cannot process compressed buffer */
if(buffer[COMP_ENABLE] == 1)
{
fprintf(stdout,\
"detect_change(): Only works with compression off\n");
return 1;
}

/* fill the new matrices */

/*
scip the header, point to first line.
format is line length low, line length high, YUYV, EOL (0xfd),
frame ends with 0xff 0xff 0xff 0xff
*/
ptr = buffer + 64;
line_cnt = 0;
while(1)
{
/* low byte line length */
low_byte = *ptr;
ptr++;

/* high byte line length */
high_byte = *ptr;
ptr++;

/* get length of this line */
line_length = low_byte + (256 * high_byte);

if(debug_flag)
{
fprintf(stdout, "detect_change(): line=%d line_length=%d\n",\
line_cnt, line_length);
}

/* test for last line (255 in both low and high position) line length */
if(line_length == 65535) break;
/* index in array */
y_idx = 0;
u_idx = 0;
v_idx = 0;
/* process the line */
i = 0;
for(i = 0; i < line_length / 4; i++)
{
/* first y */
/* to luminance matrix */
y_matrix[line_cnt][y_idx] = *ptr;
ptr++;
y_idx++;

/* u (blue vector) 255 is blue, 127 is no color, 0 is yellowish */
/* to u matrix */
u_matrix[line_cnt][u_idx] = *ptr;
ptr++;
u_idx++;

/* second y */
/* to luminance matrix */
y_matrix[line_cnt][y_idx] = *ptr;
ptr++;
y_idx++;

/* v (red vector) 255 is red, 127 = no color, 0 is greenish */
/* to v matrix */
v_matrix[line_cnt][v_idx] = *ptr;
ptr++;
v_idx++;
}/* end for each line */
ptr++;

line_cnt++;
}/* end for all lines */

/* set for no change */
change_level = 0;

/* compare new to old y matrices */
for(i = 0; i < line_cnt; i++)
{
for(j = 0; j < y_idx; j++)
{
diff = y_matrix[i][j] - old_y_matrix[i][j];
if( abs(diff) > y_difference)
{
change_level++;
}
}
}

/* compare new to old u matrices */
for(i = 0; i < line_cnt; i++)
{
for(j = 0; j < u_idx; j++)
{
diff = u_matrix[i][j] - old_u_matrix[i][j];
if( abs(diff) > u_difference)
{
change_level++;
}
}
}

/* compare new to old v matrices */
for(i = 0; i < line_cnt; i++)
{
for(j = 0; j < v_idx; j++)
{
/*diff = v_matrix[i][j] != old_v_matrix[i][j];*/
diff = v_matrix[i][j] - old_v_matrix[i][j];
if( abs(diff) > v_difference)
{
change_level++;
}
}
}

/* update old y matrix */
for(i = 0; i < line_cnt; i++)
{
for(j = 0; j < y_idx; j++)
{
old_y_matrix[i][j] = y_matrix[i][j];
}
}

/* update old u matrix */
for(i = 0; i < line_cnt; i++)
{
for(j = 0; j < u_idx; j++)
{
old_u_matrix[i][j] = u_matrix[i][j];
}
}

/* update old v matrix */
for(i = 0; i < line_cnt; i++)
{
for(j = 0; j < v_idx; j++)
{
old_v_matrix[i][j] = v_matrix[i][j];
}
}

/* set flag if treshhold exceeded */
if(change_level > treshhold) *change_flag = 1;

return 1;
}/* end function detect_change */




Ben Rudiak-Gould

2005-07-27, 8:59 am

micromysore@gmail.com wrote:
> I was looking at the GOP structure of a mpeg2 video file.
> At every scene change, a new GOP is started with I-frame.
>
> What is the algorithm used to determine the scene change in mpeg2?


There is no correct way to encode MPEG-2 video. The MPEG-2 standard only
specifies the file format and the decoding algorithm. An encoder is simply
any program that produces a compliant file. Anyone can write a perfect
decoder, but writing a perfect encoder is a hard AI problem.

So you're never going to find /the/ algorithm for detecting scene changes.
All you'll find is various heuristic techniques which seem to give good
results in practice -- i.e. which seem to produce files that look better at
a given bit rate.

-- Ben
ast

2005-07-27, 5:01 pm


"Billy Joe" <see.id.line@invalid.org> a écrit dans le message de news: XfydnR0btN4uVHvfRVn-hw@adelphia.com...
|> hello,
| >
| > I was looking at the GOP structure of a mpeg2 video file.
| > At every scene change, a new GOP is started with I-frame.
| >
| > What is the algorithm used to determine the scene change in
| > mpeg2?
| >
| > -micromysore
|
| Apparently there isn't one. The Hauppauge capture devices, for example,
| which convert analogue to MPEG1 or 2, create a GOP every 15 MPEG2 frames
| regardless of scene content.
|
| While this appears counter intuitive, it also seems to work very well!

Of course it is working, but this is not optimal.

|
| Variable length GOPs seem to be rare, from what limited examination I've
| done.
|
| BJ
|
|

Billy Joe

2005-07-28, 4:59 pm

> "Billy Joe" <see.id.line@invalid.org> wrote in message
> news:XNqdne_ZbMAORnvfRVn-sA@adelphia.com...
>
> DVD compliant MPEG2 uses max. 15 (PAL) or 18 (NTSC) frames
> per GOP. If this mysterious change of scene happens, new GOP
> is started, even before 15/18 frames limit is reached. How
> the scene detection works - my guess, there's a function that
> calculates a difference between 2 frames. If value reaches
> some value, change of the scene is declared. The algorithm
> "around 50%" above might actually work quite well.


Well, I'm gonna defer to the folks who do this for a living (sometimes
referred to as pros). I popped a commercial DVD (which is not encrypted)
into the reader and fired up VideoReDo to examine the VOBs. This is an NTSC
disc. Every 15th frame is an I frame. Scene changes clearly happen at B &
P frames. Can scene changes be detected? Sure. Are they? Is there a
STANDARD? Apparently these pros don't think it a worthy endeavor.

Further, I'd go back to the OP with this: If the video is already in MPEG2
format, what would be the value of doing scene change detection? I can
understand that scene change detection on source material might/would
produce somewhat smaller MPEG2 conversions. But why would they be better
quality? Surely SCD on post conversion material would imply re-conversion -
a serious potential for degradation.

And no, I have not done a hex examination of the frames in question to see
what happens in a B or P frame that is actually a MAJOR scene change. Would
it not be, in fact, merely an I frame posing as a B or P frame? I'm sure
everyone else here knows, so I'm happy to hear the facts.

BJ


Billy Joe

2005-07-28, 4:59 pm

> "Billy Joe" <see.id.line@invalid.org> a écrit dans le message
> de news: XfydnR0btN4uVHvfRVn-hw@adelphia.com...
>
> Of course it is working, but this is not optimal.
>

optimal for what? Quality or size?

BJ


Billy Joe

2005-07-28, 4:59 pm

> micromysore@gmail.com wrote:
>
> There is no correct way to encode MPEG-2 video. The MPEG-2
> standard only specifies the file format and the decoding
> algorithm. An encoder is simply any program that produces a
> compliant file. Anyone can write a perfect decoder, but
> writing a perfect encoder is a hard AI problem.
> So you're never going to find /the/ algorithm for detecting
> scene changes. All you'll find is various heuristic
> techniques which seem to give good results in practice --
> i.e. which seem to produce files that look better at a given
> bit rate.
> -- Ben


amen!!!

BJ


Jim Leonard

2005-07-28, 4:59 pm

Billy Joe wrote:
> Well, I'm gonna defer to the folks who do this for a living (sometimes
> referred to as pros). I popped a commercial DVD (which is not encrypted)


I'm not trying to be pedantic, but if it wasn't encrypted then I
wouldn't call it "pro". Please look at a hollywood-produced movie that
has encryption. While the main 24fps movie may or may not have
variable-length GOPs, you'll find that the polished video/30fps special
features do.

I'm not saying that all unencrypted DVDs aren't pro (my own DVD has no
encryption, for example); all I'm saying is that one single DVD is not
a comprehensive test.

Jim Leonard

2005-07-28, 4:59 pm

Billy Joe wrote:
> optimal for what? Quality or size?


Both. If bitrate is fixed, quality. If "quality" (quantization level)
is fixed, size.

Billy Joe

2005-07-28, 4:59 pm

> Billy Joe wrote:
>
> I'm not trying to be pedantic, but if it wasn't encrypted
> then I wouldn't call it "pro". Please look at a
> hollywood-produced movie that has encryption. While the main
> 24fps movie may or may not have variable-length GOPs, you'll
> find that the polished video/30fps special features do.
>
> I'm not saying that all unencrypted DVDs aren't pro (my own
> DVD has no encryption, for example); all I'm saying is that
> one single DVD is not a comprehensive test.


I agree completely, and eagerly await your comprehensive analysis of
encrypted DVDs ;-0)

The one and only disc I checked, grabbed somewhat randomly from the few
unencrypted discs I have available, had both film and TV transfers. Either
type had I frames at 15 frame intervals. The Hauppauge capture devices
which I have (PCI H250, and USB2) both produce 15 frames per GOP regardless
of scene changes.

I'll gladly look at any PRO DVD sources you can cite which have variable
GOPs, all of which begin on scene changes. Yet, even if you did/could cite
one, the fact that we see different implementations in different
professional videos somewhat implies a lack of standard, no?

BJ


Jim Leonard

2005-07-28, 9:59 pm

Billy Joe wrote:
> The one and only disc I checked, grabbed somewhat randomly from the few
> unencrypted discs I have available, had both film and TV transfers. Eith=

er
> type had I frames at 15 frame intervals. The Hauppauge capture devices
> which I have (PCI H250, and USB2) both produce 15 frames per GOP regardle=

ss
> of scene changes.


My ReplayTV unit, which uses an NEC =B5PD61051
(http://www.necel.com/digital_av/eng...051_d61052.html)
produces variable bitrate content with variable-length GOPs. Like I
said, you need to look at more than just what you have in front of you.

> I'll gladly look at any PRO DVD sources you can cite which have variable
> GOPs, all of which begin on scene changes. Yet, even if you did/could ci=

te
> one, the fact that we see different implementations in different
> professional videos somewhat implies a lack of standard, no?


I never implied there was a standard, just best practice.

It is at this juncture in the conversation that I have lost the
original point/question you were trying to make/ask. :-) Are you
trying to prove that there is no standard for variable-length GOPs?
Correct, there is no standard on how to handle scene changes and
variable-length GOPs. Are you saying it's not part of the MPEG-2
specification? Wrong, it is indeed part of the specification (ie. it's
not a "trick" or anything, and MPEG-2 files that use them are not "out
of spec"). Are you asking how scene detection is performed and how
that is taken advantage of in MPEG-2 files? See previous discussion.

Are you asking *why* this is done? Because it makes the most sense
from a compression standpoint. I-frames contain entirely new picture
information, so it makes sense to have drastic changes in the video
content synchronized with an I-frame. Otherwise, the P and B frames
will be trying to deal with 50% or more completely changed picture
information, which they were not designed to deal with -- the end
result is that the encoder has its bitrate allocation buckets thrown
out of whack and the end result, ultimately, is less bits to allocate
to the quantized DCT blocks. In other words, picture quality suffers
when scene changes are not aligned with I-frame boundaries.

Does this answer everything? If not, let me know.

Tobias Bergmann

2005-08-02, 4:59 pm

Jim Leonard wrote:
> I never implied there was a standard, just best practice.
>
> It is at this juncture in the conversation that I have lost the
> original point/question you were trying to make/ask. :-) Are you
> trying to prove that there is no standard for variable-length GOPs?
> Correct, there is no standard on how to handle scene changes and
> variable-length GOPs. Are you saying it's not part of the MPEG-2
> specification? Wrong, it is indeed part of the specification (ie. it's
> not a "trick" or anything, and MPEG-2 files that use them are not "out
> of spec"). Are you asking how scene detection is performed and how
> that is taken advantage of in MPEG-2 files? See previous discussion.
>
> Are you asking *why* this is done? Because it makes the most sense
> from a compression standpoint. I-frames contain entirely new picture
> information, so it makes sense to have drastic changes in the video
> content synchronized with an I-frame. Otherwise, the P and B frames
> will be trying to deal with 50% or more completely changed picture
> information, which they were not designed to deal with -- the end
> result is that the encoder has its bitrate allocation buckets thrown
> out of whack and the end result, ultimately, is less bits to allocate
> to the quantized DCT blocks. In other words, picture quality suffers
> when scene changes are not aligned with I-frame boundaries.


If the next P frame after the scene change is "denaturated" to intra
blocks only and the B frames after and on the scene change use that P as
reference only there is no degradation!
But I doubt that encoders actually do that. It's most likely that the
B's contain skips which results in a frame showing both partly the old
scene and parts of the new scene. Anybody have examples?

For DVD content a GOP of 15 is way too small but most PROs don't care...
they just reuse the equipment targeted at live streams. The DVD offers
enough space/bitrate to cover for this unoptimal setting.

bis besser,
Tobias
--
Tobias Bergmann
Institut für Technische Informatik
Lehrstuhl für Rechnerarchitektur
Uni-Stuttgart e-mail: tobias.bergmann@informatik.uni-stuttgart.de
Pfaffenwaldring 47 Tel: +49-(0)711-7816-407
D-70569 Stuttgart Fax: +49-(0)711-7816-288
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com