(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)
(Υπογραϕή)
F 1 F 1 RGB ECR RGB ECR
δ w a d λ σ δ δ λ w λ w λ λ λ σ σ
+
F 1 ( ) V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9 V 10 M 1 M 2 M 3 F 1 F 1 F 1 10 M 1 M 2 M 3
25 44100
http://www.anvil-software.org/
N 2 N (i, 1) i (i, 2)
We wuz gettin wuh ried We was worried X := (x 1, x 2,..., x N ) Y := (y 1, y 2,..., y M ) N N M N x n, y m F F n [1,..., N] m [1,..., M] c : F F R 0 X Y C R N M C(n, m) := c(x n, y m ) p = (p 1,..., p L ) X Y p l = (n l, m l ) [1,..., N] [1,..., M] l [1,..., L] p 1 = (1, 1) p L = (N, M) n 1 n 2 n L m 1 m 2 m L p l+1 p l {(1, 0), (0, 1), (1, 1)} l [0,..., L 1] x nl X y ml Y
D N M n D(n, 1) = c(x i, y 1 ) n [1,..., N] i=1 m D(1, m) = c(x 1, y i ) m [1,..., M] i=1 D(n, m) = {D(n 1, m), D(n, m 1), D(n 1, m 1)} + c(x n, y m ) n [2,..., N], m [2,..., M] D(n, m) n X m Y p = (p 1, p 2,... p L ) p L = (N, M) p l = (n, m) (n, m) = (1, 1) l = 1 p l 1 := (1, m 1), n = 1 (n 1, 1), m = 1 {D(n 1, m), D(n, m 1), D(n 1, m 1)}, D(n 1, m 1)
X Y D(n 1, m) X Y D(n, m 1) a b ( a, b ) (i, j) (i, j) = 0 a,b (i 1, j) + (i, j) = a,b (i, j 1) + a,b (i 1, j 1)+ 1 (ai b j ) 1, a i = b j 1 (ai b j ) = 0, http://www.itl.nist.gov/iad/mig/tools/
3 4 C C
hard cuts fade out fade in dissolve wipes G(x, y, t) l trans Å E fo = G(x, y, t) 1 t ã l 1 t [t1,t 1 +l 1 ] Å t ã E fi = G(x, y, t) l 2 t [t2,t 2 +l 2 ] Å E d = G 1 (x, y, t) 1 t ã l 1 t [t1,t 1 +l 1 ] Å t ã + G 2 (x, y, t) l 2 t [t2,t 2 +l 2 ]
3.2. ΥΠΑΡΧΟΥΣΕΣ ΜΕΘΟΔΟΙ 43 Σχήμα 3.1: Είδη μεταβάσεων λήψεων - Hard Cut, Fade Out, Fade In, Dissolve, Wipe 3.2 Υπάρχουσες Μέθοδοι Οι μέθοδοι που χρησιμοποιούνται για τον εντοπισμό των ορίων μεταξύ διαδοχικών λήψεων χρησιμοποιούν κυρίως οπτικά χαρακτηριστικά, όπως χρώμα και ακμές. Τα οπτικά χαρακτηριστικά αναμένεται να αλλάζουν δραματικά σε ένα hard cut, ενώ σε μια ομαλή μετάβαση, από μια λήψη στην επόμενη, αναμένεται μια σταδιακή μεταβολή τους. Τα διάφορα χαρακτηριστικά που έχουν χρησιμοποιηθεί κατά καιρούς στη βιβλιογραφία για τον εντοπισμό των εναλλαγών λήψεων αναλύονται παρακάτω.
i H i L 1 G SD i = H i (j) H i+1 (j) j=1 G SD G i SD i T i Hi RED Hi GREEN Hi BLUE SDi RED SDi GREEN SDi BLUE SD SD i = SDRED i + SDi GREEN 3 + SD BLUE i Edge Change Ratio (ECR) I I E E εισερχόμενες ακμές X in E r E εξερχόμενες ακμές X out E r
4 x 104 3 2 1 0 100 0 100 200 300 1500 4000 1000 3000 2000 500 1000 0 100 0 100 200 300 0 100 0 100 200 300 E σ = edge pixels in E σ = edge pixels in E ECR = ( Xin, Xout σ σ ) ρ in ρ out E E r ID ID
0.8 0.6 SD 0.4 0.2 0 7000 8000 9000 10000 Frames E ID X in E ID X out ρ in ρ out ρ in ρ out ρ out ρ in 0.8 ECR 0.6 0.4 0.2 0 7000 8000 9000 10000 Frames
5 100 = {} { } { } = F 1 = 2 {} { } {} +
L 1 SD SD T τ 5 50 1 1 0.8 0.8 0.6 0.6 0.4 Precision Recall F1 0.2 0 0.02 0.04 0.06 0.08 0.1 Threshold Value 5 0.4 Precision Recall F1 0.2 0 0.02 0.04 0.06 0.08 0.1 Threshold Value 50 F 1 τ = 0.05
0.96 0.94 0.92 0.9 Precision Recall F1 0.96 0.94 0.92 0.9 0.88 0.86 0.1 0.12 0.14 0.16 Threshold Value 5 Precision 0.88 Recall F1 0.86 0.1 0.12 0.14 0.16 Threshold Value 50 F 1 4 5
1 0.95 0.8 Precision 0.6 0.4 Recall 0.9 0.2 0.4 0.2 Thres ECR 0 0 0.2 Thres RGB 0.4 0.85 0.4 0.2 Thres ECR 0 0 0.2 0.4 Thres RGB 1 0.8 F1 0.6 0.4 0.4 0.2 Thres ECR 0 0 0.1 0.2 0.3 0.4 Thres RGB F 1 RGB ECR RGB ECR ( RGB, ECR ) = (0.58, 0.20) F 1 = 94.63 Recall = 94.8
5 50 ColSim i j D Gk=1 (H i (k), H j (k)) ColSim(i, j) = Gk=1 H j (k) D(i, j) = 1 ColSim(i, j) ϵ [0, 1]
Mean Precision 1 0.9 0.8 0.7 1 Mean Recall 1 0.5 1 0 0.5 Thres ECR 0 0 0.5 Thres RGB 1 0.5 Thres ECR 0 0 0.5 Thres RGB 1 1 0.8 Mean F1 0.6 0.4 0.2 1 0.5 Thres ECR 0 0 0.5 Thres RGB 1 F 1 RGB ECR
A Ã a(i, j) = 1 1 G (HSV i (k) HSV j (k)) 2 2 k=1 D (i, i) i A L = I D 1 2 AD 1 2 K x 1, x 2,..., x K L X = [x 1 x 2... x K ] λ X Y y ij = x ij» j x 2 ij Y K i j i Y j
= 1 (F, KF ) = 1 1 N S(F n, KF ) N n=1 F = {F 1, F 2,..., F N } KF = {KF 1, KF 2,..., KF Nkf } S F n S(F n, KF ) = j (F n, KF j ) (F i, F j ) = 1 (F i, KF i ) i=1
KF () F n = (KF nj, KF nj+1 ) N (F, KF ) = (F n, F n ) n=1 (F n, F n ) = ( /(F n, F n )) = 1 (F i, KF i ) i=1 ϵ = 0.09 λ = 0.005 2
( RGB, ECR ) = (0.58, 0.20) F 1
[T start, T end ] [T start, T end ] V isdiss(s i, S j ) = d(s i, S j ) D(f l, f m ) f l KF i,f m KF j D KF k k d t
T C i C j ˆd max (C i, C j ) = d(sl, S ˆd(S l, S k ) = k ) d t (S l, S k ) T ˆd(Sl, S k ) S l C i,s k C j. d NumClusters NumShots NumClusters = 1 ˆd max (A, B) > δ ˆd max (R, S) ˆd max (A, B) NumClusters NumClusters 1 δ [0, 1] δ = 0 δ = 1 NumClusters = 1 NumClusters = NumShots
X 1 = L 1 L 2... L w X 2 = K 1 K 2... K w L i, K i ClusterLabels i [1,..., w] (w +1) (w +1) N N(i, j) X 1 (1... i) X 2 (1... j) X 1 (i) X 2 (j) X 1 (i) X 2 (j) X 1 (i) X 2 (j) N(i 1, j 1) + S(X 1 (i), X 2 (j)) N(i, j) = N(i 1, j) d N(i, j 1) d S NumClusters NumClusters πίνακας αντικατάστασης C i C j d S V issim(s i, S j ) = f l KF i,f m KF j ColSim(f l, f m )
m i C i CSM(i, j) = V issim(m i, m j ) C i C j P P M(i, j) = 1 NumShots 1 { pairs(l 1 = C i, L 2 = C j )} NumShots L 1 L 2 S(i, j) = CSM(i, j) + P P M(i, j) i = j α(1 CSM(i, j)) β(1 P P M(i, j)) i j α β α + β = 1 N T N traceback N T N = 0 d wd d wd T = done left left up up
N (i, j) T (i, j) N(i, j) T (w + 1, w + 1) X 1 X 2 X 1 (i) X 2 (j) F F = S( ) () d S( ) = +
S z Mot z Mot z = 1 b 1 D(f, f + 1) b a f=a a, b D S i S j MotSim(S i, S j ) = 2 (Mot i, Mot j ) Mot i + Mot j ShotSim(S i, S j ) = α V issim(s i, S j ) + β MotSim(S i, S j ) α β α + β = 1 G = (V, E) i v i e(i, j) E i j W (i, j) W W (S i, S j ) = w(i, j) ShotSim(S i, S j )
w(i, j) w(i, j) = Ä 1 d m i m j σ 2 ä m i m j σ G = (V, E) G = (V, E ) G = (V, E ) V V = V V V = cut(v, V ) = W (i, j) i V,j V G n = (V n, E n ) assoc(v n, V ) = W (i, j) i V n,j V Ncut(V, V ) = cut(v, V ) assoc(v, V ) + cut(v, V ) assoc(v, V ) Ncut (i < j i > j) v i V, v j V
Ncut Ncut λ
D kf S i n KF i = {kf i1,..., kf in } D Si = D kfi...... D kfn D S = D S1 D S2 D SN k V H i P D Si = {d 1,..., d P } {C 1,..., C k } V H i (l) = {d j C l, j = 1,..., P } P l = 1,..., k
K σ σ SH t = n= (V H t n )K σ (t n) SH Ã k V i = (SH i (h) SH i+1 (h)) 2 h=1 V i = 1 kh=1 (SH i (h), SH i+1 (h)) kh=1 SH i (h) kh=1 (SH i (h) SH i+1 (h)) 2 V i = 0.5 kh=1 (SH i (h) + SH i+1 (h)) 2 50
T δ 80 F 1 10 T = 500 1500 δ = 0.2 0.3 Mean Recall 1 0.5 0 5000 Mean F1 0.4 0.2 6000 0 T 0 0.2 δ 0.4 0.6 4000 2000 T 0 0.2 δ 0.4 0.6 δ d α β S a(i, j) = V issim(s i, S j ) V issim λ
w d = 1 λ = 0.005 1 0.4 Mean Recall 0.5 1 0 Mean F1 0.2 1 0 0.5 a 0 2 4 w 6 8 0.5 a 0 2 4 w 6 8 w a a = 0.1 w = 2 = 58 F 1 = 18 d λ α = β = 0.5 53 F 1 19 d = 20 λ = 1
0.7 0.25 Mean Recall 0.6 0.5 0.4 0.3 0.2 50 d 0 0.4 0.6 λ 0.8 1 Mean F1 0.2 0.15 0.1 50 d 0 0.4 0.6 λ 0.8 1 d λ F 1 σ 50 150 Mean Recall 0.65 0.6 0.55 0.5 0.45 0.4 0.35 σ=1 σ=8 σ=15 σ=20 0 100 200 300 400 500 Number of Visual Words Mean F1 0.1 0.09 0.08 0.07 0.06 σ=1 σ=8 σ=15 σ=20 0.05 0 100 200 300 400 500 Number of Visual Words σ
(T, δ) = (1000, 0.3) 250
{Sc i 1, Sc i, Sc i+1 } Sc i d (Sc i, Sc j ) = d(s l, S k ) S l Sc i,s k Sc j Sc i F 1
δ δ K {U i } K i=1 τ(u i ) U i U m i 1 m 1 U 1 U 1 i K i i + 1 T τ(u i ) + τ(u m) S j U i U m (T, δ ) U i U m U m U i U m m m + 1 U m U i {U 1, U 2,..., U m}
δ = 0.3 = 79.91 F 1 = 33 δ = 0.2 = 83.15 F 1 = 35.25 1 0.8 0.6 MeanValues F1 Precision Recall 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 δ * δ 1 MeanValues 0.8 0.6 0.4 F1 Precision Recall 0.2 0 0 0.2 0.4 0.6 0.8 1 δ * δ
X = {x i : i = 1,..., N} M = {M i : i = 1,..., K} M d L(X, M) BIC(M) = (L(X, M)) 0.5λd (N) λ M i M j BIC = BIC(M i ) BIC(M j ) M i M j X = {x i R k : i = 1,..., N} H 0 : x 1... x N f(θ) H 1 : x 1... x i f(θ 1 ); x i+1... x N f(θ 2 ) x i f(θ i ) i
BIC(i) BIC(i) = BIC(M 1 ) BIC(M 0 ) = L(X, θ 1 ) + L(X, θ 2 ) 0.5λ(2d) (N) ( L(X, θ) 0.5λd (N)) = L(X, θ 1 ) + L(X, θ 2 ) L(X, θ) 0.5λd (N) ˆt = BIC(i) M 1 M 0 M 1 M 0 BIC = BIC(M 1 ) BIC(M 0 ) BIC > 0 λ
1 0.8 Weight 0.6 0.4 0.2 0 0 2000 4000 6000 8000 Frequency (Hz)
w λ λ w 87 F 1 16 = 82.23 F 1 = 37.92 (λ, w ) w λ 1 0.5 Mean Recall 0.8 0.6 0.4 Mean F1 0.4 0.3 0.2 0.2 10 5 λ 0 2 4 w 6 0.1 10 5 λ 0 2 4 w 6 λ w
Mean Recall 1 0.8 0.6 0.4 Mean F1 0.5 0.45 0.4 0.35 0.2 10 5 λ 0 2 4 w 6 10 5 λ 0 2 w 4 6 λ w λ w
256 256 3 = 768 λ F 1
1 0.5 Mean Recall 0.5 20 0 10 λ 0 5 20 15 10 PCA components Mean F1 0.4 0.3 0.2 0.1 20 10 λ 0 5 20 15 10 PCA components λ BIC V BIC A
Mean Recall 1 0.8 0.6 0.4 0.2 Mean F1 0.5 0.4 0.3 0.2 20 0 10 λ 0 5 10 15 20 PCA components 0.1 20 10 λ 0 5 10 15 20 PCA components λ BIC AV = BIC A + BIC V BIC AV w = 2 10
1 0.5 Mean Recall 0.5 Mean F1 0.4 0.3 0.2 20 0 10 λ audio 0 0 10 λ image 20 0.1 20 10 λ audio 0 0 10 λ image 20 λ 32 4 8 32 16 4 4 32 16 = 512 k
σ F 1
k = 100 σ = 1 Mean Recall 0.7 0.6 0.5 0.4 NumWords=100 NumWords=150 NumWords=200 NumWords=500 NumWords=650 NumWords=800 Mean F 1 0.5 0.45 0.4 0.35 0.3 NumWords=100 NumWords=150 NumWords=200 NumWords=500 NumWords=650 NumWords=800 0.3 0.25 0.2 0 5 10 15 20 σ 0.2 0 5 10 15 20 σ σ σ k
D S σ k
Mean Recall 0.45 0.4 0.35 0.3 0.25 0.2 0 5 10 15 σ Mean F 1 0.4 0.35 0.3 0.25 0.2 0 5 10 15 σ σ 50 250 σ = 1 100 0 15sec 0 375 60sec 375 1500
1 0.8 Recall Precision 0.6 0.4 0.2 0 20 40 60 Scene Tolerance (sec)
Pattern Recognition and Machine Learning Graph theory with applications Proceedings of the ACM International Conference on Image and Video Retrieval IEEE Transactions on Multimedia Proc. DARPA Broadcast News Transcription and Understanding Workshop IEEE Transactions on Multimedia Journal of Real-Time Image Processing Proceedings of the Second ACM International Conference on Multimedia Principal component analysis 5th International Conference on Visual Information Engineering
Multimedia information extraction International Journal of Computer Vision Journal of Computing Information Retrieval for Music and Motion Journal of Molecular Biology Advances in Neural Information Processing Systems International Journal of Computer Vision IEEE International Symposium on Multimedia Int. Conf. Computer Vision and Pattern Recognition IEEE Transactions on Multimedia The Annals of Statistics IEEE Transactions on Pattern Analysis and Machine Intelligence Proceedings of the 17th ACM international conference on Multimedia Machine Vision for three-demensional Sciences
Procs. IEEE Int. Conf. Multimedia and Expo, ICME IEEE Transactions on Multimedia Levenshtein distance wikipedia, the free encyclopedia http: / / en. wikipedia. org / w / index. php? title = Levenshtein _ distance&oldid=623739638 Comput. Vis. Image Underst. Proceedings of the Third ACM International Conference on Multimedia IEEE Transactions on Multimedia, Multimedia Systems Proc. International Conference on Spoken Language Processing