v.connect 2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 VOCALOID

Σχετικά έγγραφα
ECE 308 SIGNALS AND SYSTEMS FALL 2017 Answers to selected problems on prior years examinations

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Buried Markov Model Pairwise

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

ΑΥΤΟΜΑΤΟΠΟΙΗΣΗ ΜΟΝΑΔΑΣ ΘΡΑΥΣΤΗΡΑ ΜΕ ΧΡΗΣΗ P.L.C. AUTOMATION OF A CRUSHER MODULE USING P.L.C.

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

ST5224: Advanced Statistical Theory II

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Statistical Inference I Locally most powerful tests

Section 8.3 Trigonometric Equations

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Modbus basic setup notes for IO-Link AL1xxx Master Block

Probability and Random Processes (Part II)

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Instruction Execution Times

CMOS Technology for Computer Architects

Medium Data on Big Data

Sampling Basics (1B) Young Won Lim 9/21/13

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

6.003: Signals and Systems. Modulation

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Second Order RLC Filters

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

(C) 2010 Pearson Education, Inc. All rights reserved.

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

ΕΦΑΡΜΟΓΗ ΕΥΤΕΡΟΒΑΘΜΙΑ ΕΠΕΞΕΡΓΑΣΜΕΝΩΝ ΥΓΡΩΝ ΑΠΟΒΛΗΤΩΝ ΣΕ ΦΥΣΙΚΑ ΣΥΣΤΗΜΑΤΑ ΚΛΙΝΗΣ ΚΑΛΑΜΙΩΝ

Detection and Recognition of Traffic Signal Using Machine Learning

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

Homework 8 Model Solution Section

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Strain gauge and rosettes

Επεξεπγαζία Ήσος Φυνήρ 4 η Διάλεξη ΦΗΦΙΑΚΟ ΣΟΤΝΣΙΟ

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Development of a basic motion analysis system using a sensor KINECT

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Tridiagonal matrices. Gérard MEURANT. October, 2008

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Mean-Variance Analysis

An Advanced Manipulation for Space Redundant Macro-Micro Manipulator System

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Additional Results for the Pareto/NBD Model

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

MnZn. MnZn Ferrites with Low Loss and High Flux Density for Power Supply Transformer. Abstract:

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Numerical Analysis FMN011

ΗΜΥ 220: ΣΗΜΑΤΑ ΚΑΙ ΣΥΣΤΗΜΑΤΑ Ι Ακαδημαϊκό έτος Εαρινό Εξάμηνο Κατ οίκον εργασία αρ. 2

Reaction of a Platinum Electrode for the Measurement of Redox Potential of Paddy Soil

ME 374, System Dynamics Analysis and Design Homework 9: Solution (June 9, 2008) by Jason Frye

Example Sheet 3 Solutions

2 Composition. Invertible Mappings

University of Illinois at Urbana-Champaign ECE 310: Digital Signal Processing

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

*,* + -+ on Bedrock Bath. Hideyuki O, Shoichi O, Takao O, Kumiko Y, Yoshinao K and Tsuneaki G

wave energy Superposition of linear plane progressive waves Marine Hydrodynamics Lecture Oblique Plane Waves:

TMA4115 Matematikk 3

Elements of Information Theory

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Lecture 21: Scattering and FGR

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

CRASH COURSE IN PRECALCULUS

Applying Markov Decision Processes to Role-playing Game

Mock Exam 7. 1 Hong Kong Educational Publishing Company. Section A 1. Reference: HKDSE Math M Q2 (a) (1 + kx) n 1M + 1A = (1) =

derivation of the Laplacian from rectangular to spherical coordinates

LUNGOO R. Control Engineering for Development of a Mechanical Ventilator for ICU Use Spontaneous Breathing Lung Simulator LUNGOO

2.153 Adaptive Control Lecture 7 Adaptive PID Control

Comparison of Evapotranspiration between Indigenous Vegetation and Invading Vegetation in a Bog

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

Srednicki Chapter 55

«Έντυπο και ψηφιακό βιβλίο στη σύγχρονη εποχή: τάσεις στην παγκόσμια βιομηχανία».

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Test Data Management in Practice

Τo ελληνικό τραπεζικό σύστημα σε περιόδους οικονομικής κρίσης και τα προσφερόμενα προϊόντα του στην κοινωνία.

Inverse trigonometric functions & General Solution of Trigonometric Equations

ΠΕΡΙΕΧΟΜΕΝΑ. Κεφάλαιο 1: Κεφάλαιο 2: Κεφάλαιο 3:

Second Order Partial Differential Equations

Lecture 12 Modulation and Sampling

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

1 h, , CaCl 2. pelamis) 58.1%, (Headspace solid -phase microextraction and gas chromatography -mass spectrometry,hs -SPME - Vol. 15 No.

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

ΑΝΙΧΝΕΥΣΗ ΓΕΓΟΝΟΤΩΝ ΒΗΜΑΤΙΣΜΟΥ ΜΕ ΧΡΗΣΗ ΕΠΙΤΑΧΥΝΣΙΟΜΕΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

! " # $ &,-" " (.* & -" " ( /* 0 (1 1* 0 - (* 0 #! - (#* 2 3( 4* 2 (* 2 5!! 3 ( * (7 4* 2 #8 (# * 9 : (* 9

Πανεπιστήμιο Πειραιώς Τμήμα Πληροφορικής Πρόγραμμα Μεταπτυχιακών Σπουδών «Πληροφορική»

Spectrum Representation (5A) Young Won Lim 11/3/16

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Pg The perimeter is P = 3x The area of a triangle is. where b is the base, h is the height. In our case b = x, then the area is

Project: 296 File: Title: CMC-E-600 ICD Doc No: Rev 2. Revision Date: 15 September 2010

ITU-R BT ITU-R BT ( ) ITU-T J.61 (

MathCity.org Merging man and maths

Homework 3 Solutions

Transcript:

v.connect 1 1 1 VOCALOID UTAU WORLD Vorbis v.connect 2 1.7 2.2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 Since the release of Hatsune Miku, interets in singing synthesis increase. For example, a singing synthesis system, UTAU, has been developed as a freeware. Most of these systems, however, lack of the function that users can mix vocal tones at any times. Controling tonal changes in singing requires a large amount of time and data for synthesis. We have developed a singing synthesis system, v.connect, which connects corresponding phonemes with a time-stretching function to enable users to control tonal changes in singing by specifying the rate of voice morphing. The system processes voice signals with WORLD, a voice synthesis and analysis system, and uses corpora of various tonal voices consisting of Mel cepstra and excitation signals compressed by Vorbis. We constructed a corpus, Namine Ritsu Connect, using the proposed method. It was found that the size of the corpus was two times larger than that of raw waves, and that synthesis from the corpus was 1.7 to 2.2 times faster than that from raw waves. Degradation caused by compression was not sensed subjectively. 1. VOCALOID 1) VOCALOID Append UTAU 2) 3) 4) VocalListener2 VocalListener2 VocalListener2 v.connect v.connect 2 WORLD 5) Project 1 1 The University of Electro-Communications 1 http://ritsu73.net/ 1 c 2012 Information Processing Society of Japan

2 v.connect 3 4 5 6 2. v.connect Cadencii 6) Cadencii Cadencii 1 v.connect Cadencii Cadencii v.connect 2.1 Cadencii Cadencii VOCALOID2 VOCALOID, VOCALOID2, Sinsy 2 MusicXML, UTAU, AquesTone 3 v.connect GUI Cadencii 1 1 Cadencii Fig. 1 Screen shot of Cadencii. 1 Cadencii http://sourceforge.jp/projects/cadencii/ 2 Sinsy - HMM-based Singing Voice Synthesis System http://www.sinsy.jp/ 3 AquesTone - VSTi http://www.a-quest.com/products/aquestone.html 2 v.connect Fig. 2 Block diagram of v.connect. 2.2 v.connect v.connect 2 v.connect Input VOCALOID2 v.connect Cadencii v.connect v.connect WORLD Output v.connect OggVorbis 7) 2 c 2012 Information Processing Society of Japan

3. v.connect VCV 3.1 WORLD WORLD WORLD Vocoder WORLD DIO 8) STAR 9) PLATINUM 10) PLATINUM x(t) h(t) R(ω) R(ω) = X(ω) (X(ω) = FFT[x(t)w(t)], H(ω) = FFT[h(t)]) (1) H(ω) w(t) WORLD Vocoder 3.2 WORLD WORLD Vorbis 3 3 WORLD 3 Fig. 3 Block diagram of v.connect s analysis. F0 Vorbis F0 WORLD 3.3 A, B t E(t) x(t) m E(t) = (x(t + i )) 2 (2) f s i= m f s m 3 c 2012 Information Processing Society of Japan

A, B E A(t), E B(t) t=la t=0 E A (t) E B (T (t)) d t 2 + T 2 (t) s.t. dt (t) dt > 0 (3) T (t) l A, l B A, B T (l A ) = l B T (t) DP DP DP h DP [i, j] [i + p, j + q], (1 p l A h i, 1 q l B j) d h d ij < p, q >= h p 2 + q 2 t2 E A (t) E B (T ij < p, q > (t)) dt t 2 t 1 t 1 (4) t 1 = hi, t 2 = h(i + p) DP T ij < p, q > (t) d { q (t t1) + hj (t1 < t t2) T ij < p, q > (t) = p (5) hp i = l A, hq i = l B C = (p i, q i ) 0 i n p 0 = q 0 = 0 C E A, E B D D = d (i) i=1 d (i) = d ui v i < p i, q i > i 1 i 1 u i = p j, v i = q j j=1 j=1 D C min T (i) min = T u i v i < p i, q i > T = T (i) min DP i=1 (6) 1 < N < 1 h max(l A, l B ) N 1 p, q N 4. F0 F0 4.1 Cadencii BRI p(t) 4.2 F0 F0 F0 Ω F0 2 H(s) = s 2 + 2ζΩs + Ω 2 H(s) F0 ζ, Ω F0 f 0 (t) n t (n), l (n), f (n), l (n) por, d (n) por, l (n) nnote f0(t) note log f 0(t) = log f (t) + F flu (t) + F por(t) (i) note log f (t) = log f note + F (i) (t) { f (n) (t (n) < t t (n) + l (n) ) log f note (t) = F por (n) (t), F (n) (t) n F flu (t) (7) 4 c 2012 Information Processing Society of Japan

F por (n) (t) = (log f (n+1) log f (t))( 1 (1 cos θ(n) por(t))) 2 d (n) por(log f (n+1) log f (n) )(sin θ pre(t)) (n) F (n) (t) = d(t) sin θ(n) F flu (t) = θ (n) por(t) = (t) 1 100 (sin 12.7πt + sin 7.1πt + 1 sin 4.7πt) 3 θ por(t), (n) θ (n) (t) t t p t (n) + l (n) t p π (t p < t < t (n) + l (n) ) θ pre(t) (n) = θ por(t) (n) + θ por(2(t (n) (n) + l (n) ) t) t θ (n) (t) = s(τ)dτ (t v < t < t (n) + l (n) ) t v t p = t (n) + l (n) l (n) por, t v = t (n) + l (n) l (n) (8) (9) d(t), s(t) Wen-Hsing Lai Mandarin Singing Synthesis 11) 4.3 BRI BRI 2 BRI 4 t = t t (n) 2 A, B t A, t B A B BRI b A, b B (b A > b B ) p A, p B b(t) BRI p bri (t) p bri (t) b B b A b B (b B p bri (t) b A ) 0 (p bri (t) < b B ) b(t) = 1 (p bri (t) < b A ) (10) Fig. 4 4 Block diagram of v.connect s synthesis. A B T AB(t) T 1 AB (t) A, B { t A = b(t)(t + p A ) + (1 b(t))t 1 AB (t + p B ) (11) t B = (1 b(t))(t + p B ) + b(t)t AB (t + p A ) p A, p B A, B t A, t B S A (ω), S B (ω) r A(τ), r B(τ) S(ω) r(τ) log S(ω) = b(t) log S A + (1 b(t)) log S B (12) r(τ) = b(t)r A(τ) + (1 b(t))r B(τ) (13) S(ω, t), r(τ, t) 5 c 2012 Information Processing Society of Japan

Table 1 1 Recording environment and contents of Namine Ritsu Connect. 2 2ms byte Table 2 Comparision of data amount per 2ms (in bytes). VCV 955 Audio-Technica AT-4040 Audio I/F Roland UA-25EX SG 2000 RT 32 OggVorbis 64kbits / 44100 samples B4(493.9Hz) F4(349.2Hz) WORLD v.connect 176.4 - - - 4096 128-8192 200 176.4 12288 328 3 Table 3 Comparision of synthesis time with WORLD and v.connect. CPU WORLD(sec) (sec) 4.4 l pre = min (p A, p B ) n t (n) begin = t(n) l pre t (n 1) + l (n 1) F0 WORLD 5. 1 5.1 1 OggVorbis 44.1kHz 64kbps 44100 64kbits 6 7 150 3 2mora/sec 60 48kHz 24bit 44.1kHz 16bit Celeron 1.73Ghz 89.1 40.4 1 Core2Quad 2.80Ghz 39.6 20.7 1 Core2Quad 2.80Ghz 22.3 10.5 2 Corei7 3.50Ghz 22.9 13.1 1 Corei7 3.50Ghz 11.6 6.6 2 250ms 500ms m = 1024 N = 8 5.2 210MB 430MB 2 WORLD 14.2GB 2ms WORLD v.connect 2 byte WORLD v.connect 2ms 5.3 32 3 WORLD 1.7 2.2 1 http://hal-the-cat.music.coocan.jp/ritsu.html 6 c 2012 Information Processing Society of Japan

5.4 20% 8%BRI 2 BRI 6. v.connect WORLD OggVorbis 2 1.7 2.2 Cadencii kbinani WORLD 1) VOCALOID Vol.2008, No.50, pp.57 60 (2008). 2) http://utau2008.web.fc2.com/ 2011-11-28. 3) STRAIGHT Vol.2006-MUS-64, No.19, pp.59 64 (2006). 4) VocaListener2: Vol.2010-MUS-86, No.3, pp. 1 10 (2010). 5) WORLD Vol.41, No.7, pp.555 560 (2011). 6) kbinani: cadencii jp @ wiki - Cadencii, Cadencii Project (online), available from http://www9.atwiki.jp/boare/pages/18.html (accessed 2011-11-28). 7) XIPH.ORG: Vorbis.com, XIPH.ORG (online), available from http://www.vorbis.com/ (accessed 2011-11-28). 8) SNR F0 D Vol.93-D, No.2, pp.109 117 (2010). 9) (D) Vol.94-D, No.7, pp.1079 1087 (2011). 10) 2011 pp.325 326 (2011). 11) Lai, W.-H.: F0 Control Model for Mandarin Singing Voice Synthesis, Second International Conference on Digital Telecommunications, Vol.ICDT2, p.12 (2007). 7 c 2012 Information Processing Society of Japan