Machine Learning. Lecture 5: Gaussian Discriminant Analysis, Naive Bayes and EM Algorithm. Feng Li.

Μέγεθος: px
Εμφάνιση ξεκινά από τη σελίδα:

Download "Machine Learning. Lecture 5: Gaussian Discriminant Analysis, Naive Bayes and EM Algorithm. Feng Li."

Transcript

1 Machine Learning Lecture 5: Gaussian Discriminant Analysis, Naive Bayes and EM Algorithm Feng Li School of Computer Science and Technology Shandong University Fall 2018

2 Probability Theory Review Sample space, events and probability Conditional probability Random variables and probability distributions Joint probability distribution Independence Conditional probability distribution Bayes Theorem / 108

3 Sample Space, Events and Probability A sample space S is the set of all possible outcomes of a (conceptual or physical) random experiment Event A is a subset of the sample space S P (A) is the probability that event A happens It is a function that maps the event A onto the interval [0, 1]. P (A) is also called the probability measure of A Kolmogorov axioms Non-negativity: p(a) 0 for each event A P (S) = 1 σ-additivity: For disjoint events {A i} i such that A i Aj = for i j P ( A i) = P (A i) 3 / 108

4 Sample Space, Events and Probability (Contd ) Some consequences P ( ) = 0 P (A B) = P (A) + P (B) P (A B) P (A ) = 1 P (A) 4 / 108

5 Conditional Probability Definition of conditional probability: Fraction of worlds in which event A is true given event B is true P (A B) = P (A, B) P (B), P (A, B) = P (A B)P (B) Corollary: The chain rule n P (A 1, A 2,, A k ) = P (A k A 1, A 2,, A k 1 ) k=1 Example: P (A 4, A 3, A 2, A 1 ) = P (A 4 A 3, A 2, A 1 )P (A 3 A 2, A 1 )P (A 2 A 1 )P (A 1 ) 5 / 108

6 Conditional Probability (Contd.) Real valued random variable is a function of the outcome of a randomized experiment X : S R Examples: Discrete random variables (S is discrete) X(s) = T rue if a randomly drawn person (s) from our class (S) is female X(s) = The hometown X(s) of a randomly drawn person (s) from (S) Examples: Continuous random variables (S is continuous) X(s) = r be the heart rate of a randomly drawn person s in our class S 6 / 108

7 Random Variables Real valued random variable is a function of the outcome of a randomized experiment X : S R For continuous random variable X For discrete random variable X P (a < X < b) = P ({s S : a < X(s) < b}) P (X = x) = P ({s S : X(s) = x}) 7 / 108

8 Probability Distribution Probability distribution for discrete random variables Suppose X is a discrete random variable X : S A Probability mass function (PMF) of X: the probability of X = x p X(x) = P (X = x) Since x A P (X = x) = 1, we have p X(x) = 1 x A 8 / 108

9 Probability Distribution (Contd.) Probability distribution for continuous random variables Suppose X is a continuous random variable X : S A Probability density function (PDF) of X is a function f X(x) such that for a, b A with (a b) P (a X b) = b a f X(x)dx 9 / 108

10 Joint Probability Distribution Joint probability distribution Suppose both X and Y are discrete random variable Joint probability mass function (PMF) p X,Y (x, y) = P (X = x, Y = y) Marginal probability mass function for discrete random variables p X(x) = y P (X = x, Y = y) = y P (X = x Y = y)p (Y = y) p Y (y) = x P (X = x, Y = y) = x P (Y = y X = x)p (Y = x) Extension to multiple random variables X 1, X 2, X 3,, X n p(x 1, x 2,, x n) = P (X 1 = x 1, X 2 = x 2,, X n = x n) 10 / 108

11 Joint Probability Distribution (Contd.) Joint probability distribution Suppose both X and Y are continuous random variable Joint probability density function (PDF) f(x, y) P (a 1 X b 1, a 2 Y b 2) = Marginal probability density functions b1 b2 a 1 a 2 f(x, y)dxdy f X(x) = f Y (x) = f(x, y)dy for < x < f(x, y)dx for < y < Extension to more than two random variables b1 bn P (a 1 X 1 b 1,, a n X n b n) = a 1 a n f(x 1,, x n)dx 1 dx n 11 / 108

12 Independent Random Variables Two discrete random variables X and Y are independent if for any pair of x and y p X,Y (x, y) = p X (x)p Y (y) Two continuous random variables X and Y are independent if for any pair of x and y f X,Y (x, y) = f X (x)f Y (y) If the above equations do not hold for all (x, y), then X and Y are said to be dependent 12 / 108

13 Conditional Probability Distribution Discrete random variables X and Y Joint PMF p(x, y) Marginal PMF p X (x) = y p(x, y) The Conditional probability density function of Y given X = x or p Y X (y x) = p Y X=x (y) = p(x, y) p X (x), p(x, y) p X (x), Conditional probability of Y = y given X = x y y P Y =y X=x = p Y X (y x) 13 / 108

14 Conditional Probability Distribution (Contd.) Continuous random variables X and Y Joint PDF f(x, y) Marginal PDF f X (x) = f(x, y)dy y The Conditional probability density function of Y given X = x or f Y X (y x) = f Y X=x (y) = Probability of a X b given Y = y f(x, y) f X (x), f(x, y) f X (x), y y P (a 1 X b 1 Y = y) = b a f X Y =y (x)dx 14 / 108

15 Conditional Probability Distribution (Contd.) Continuous random variables X Discrete random variable Y Joint probability distribution P (a X b, Y = y) = P (a X b Y = y)p (Y = y) where P (a X b Y = y) = P (Y = y) = p Y (y) b a f X Y =y (x)dx 15 / 108

16 Bayes Theorem Bayes theorem (or Bayes rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event P (A B) = P (B A)P (A) P (B) In the Bayesian interpretation, probability measures a degree of belief, and Bayes theorem links the degree of belief in a proposition before and after accounting for evidence. For proposition A and evidence B P (A), the prior, is the initial degree of belief in A P (A B), the posterior, is the degree of belief having accounted for B Another form: P (A B) = P (A B)p(A) p(b) = with A being the complement of A P (B A)P (A) P (B A)P (A) + P (B A )P (A ) 16 / 108

17 Bayes Theorem (Contd ) A: you have the flu B: you just coughed Assume: P (A) = 0.05, P (B A) = 0.8, and P (B A ) = 0.2 (A denotes of the complement of A) Question: P (flue cough) = P (A B)? P (A B) = P (B A)P (A) P (B) = P (B A)P (A) P (B A)p(A) + P (B A )P (A ) = / 108

18 Bayes Theorem (Contd ) Random variables X and Y, both of which are discrete P (X = x Y = y) = Conditional PMF of X given Y = y P (Y = y X = x)p (X = x) P (Y = y) p X Y =y (x) = p Y X=x(y)p X (x) p Y (y) 18 / 108

19 Bayes Theorem (Contd ) Continuous random variable X and discrete random variable Y f X Y =y (x) = P (Y = y X = x)f X(x) P (Y = y) = p Y X=x(y)f X (x) p Y (y) 19 / 108

20 Bayes Theorem (Contd ) Discrete random variable X and continuous random variable Y P (X = x Y = y) = f Y X=x(y)P (X = x) f Y (y) In another form p X Y =y (x) = f Y X=x(y)p(x) f Y (y) 20 / 108

21 Bayes Theorem (Contd ) X and Y are both continuous f X Y =y (x) = f Y X=x(y)f X (x) f Y (y) 21 / 108

22 Prediction Based on Bayes Theorem X is a random variable indicating the feature vector Y is a random variable indicating the label We perform a trial to obtain a sample x for test, and what is P (Y = y X = x) = p(y x)? 22 / 108

23 Prediction Based on Bayes Theorem (Contd ) We compute p(y x) based on Bayes Theorem p(y x) = p(x y)p(y), y p(x) We calculate p(x y) for x, y and p(y) for y according to the given training data Fortunately, we do not have to calculate p(x), because p(x y)p(y) arg max p(y x) = arg max y y p(x) = arg max p(x y)p(y) y 23 / 108

24 Gaussian Distribution Gaussian Distribution (Normal Distribution) ( 1 p(x; µ, σ) = exp 1 ) (x (2πσ 2 µ)2 ) 1/2 2σ2 where µ is the mean and σ 2 is the variance Gaussian distributions are important in statistics and are often used in the natural and social science to represent real-valued random variables whose distribution are not known Central limit theorem: The averages of samples of observations of random variables independently drawn from independent distributions converge in distribution to the normal, that is, become normally distributed when the number of observations is sufficiently large Physical quantities that are expected to be the sum of many independent processes (such as measurement errors) often have distributions that are nearly normal. 24 / 108

25 Multivariate Gaussian Distribution Multivariate normal distribution in n-dimensions N (µ, Σ) ( 1 p(x; µ, Σ) = exp 1 ) (2π) n/2 Σ 1/2 2 (x µ)t Σ 1 (x µ) Mean vector µ R n Covariance matrix Σ R n n Mahalanobis distance: r 2 = (x µ) T Σ 1 (x µ) Σ is symmetric and positive semidefinite Σ = ΦΛΦ T Φ is an orthonormal matrix, whose columns are eigenvectors of Σ Λ is a diagonal matrix with the diagonal elements being the eigenvalues 25 / 108

26 Multivariate Gaussian Distribution: A 2D Example From left to right: Σ = I, Σ = 0.6I, Σ = 2I From left to right: Σ = I, Σ = [ ] 1 0.5, Σ = [ 1 ] / 108

27 Multivariate Gaussian Distribution: A 2D Example From left to right: µ = [ ] 1, µ = 0 [ ] [ ] 0.5 1, µ = / 108

28 Gaussian Discriminant Analysis (Contd.) Y Bernoulli(ψ) P (Y = 1) = ψ P (Y = 0) = 1 ψ Probability mass function p(y) = ψ y (1 ψ) 1 y, y = 0, 1 28 / 108

29 Gaussian Discriminant Analysis (Contd.) X Y = 0 N (µ 0, Σ) Conditional probability density function of X given Y = 0 ( 1 p X Y =0 (x) = exp 1 ) (2π) n/2 Σ 1/2 2 (x µ0)t Σ 1 (x µ 0) Or p(x y = 0) = ( 1 exp 1 ) (2π) n/2 Σ 1/2 2 (x µ0)t Σ 1 (x µ 0) 29 / 108

30 Gaussian Discriminant Analysis (Contd.) X Y = 1 N (µ 1, Σ) Conditional probability density function of X given Y = 1 ( 1 p X Y =1 (x) = exp 1 ) (2π) n/2 Σ 1/2 2 (x µ1)t Σ 1 (x µ 1) Or p(x y = 1) = ( 1 exp 1 ) (2π) n/2 Σ 1/2 2 (x µ1)t Σ 1 (x µ 1) 30 / 108

31 Gaussian Discriminant Analysis (Contd.) In summary, for y = 0, 1 p(y) = ψ y (1 ψ) 1 y ( 1 p(x y) = exp 1 ) (2π) n/2 Σ 1/2 2 (x µ y) T Σ 1 (x µ y ) 31 / 108

32 Gaussian Discriminant Analysis (Contd.) Given m sample data, the log-likelihood is l(ψ, µ 0, µ 1, Σ) m = log p(x (i), y (i) ; ψ, µ 0, µ 1, Σ) = log = m p(x (i) y (i) ; µ 0, µ 1, Σ)p(y (i) ; ψ) log p(x (i) y (i) ; µ 0, µ 1, Σ) + log p(y (i) ; ψ) 32 / 108

33 Gaussian Discriminant Analysis (Contd.) The log-likelihood function l(ψ, µ 0, µ 1, Σ) = log p(x (i) y (i) ; µ 0, µ 1, Σ) + log p(y (i) ; ψ) For each training data (x (i), y (i) ) If y (i) = 0 ( p(x (i) y (i) 1 ; µ 0, Σ) = exp 1 ) (2π) n/2 Σ 1/2 2 (x µ0)t Σ 1 (x µ 0) If y (i) = 1 p(x (i) y (i) ; µ 1, Σ) = ( 1 exp 1 ) (2π) n/2 Σ 1/2 2 (x µ1)t Σ 1 (x µ 1) 33 / 108

34 Gaussian Discriminant Analysis (Contd.) The log-likelihood function l(ψ, µ 0, µ 1, Σ) = log p(x (i) y (i) ; µ 0, µ 1, Σ) + log p(y (i) ; ψ) For each training data (x (i), y (i) ) p(y (i) ; ψ) = ψ y(i) (1 ψ) 1 y(i) 34 / 108

35 Gaussian Discriminant Analysis (Contd.) Maximizing l(ψ, µ 0, µ 1, Σ) through ψ l(ψ, µ 0, µ 1, Σ) = 0 µ0 l(ψ, µ 0, µ 1, Σ) = 0 µ1 l(ψ, µ 0, µ 1, Σ) = 0 Σ l(ψ, µ 0, µ 1, Σ) = 0 35 / 108

36 Gaussian Discriminant Analysis (Contd.) Solutions: ψ = 1 1{y (i) = 1} m µ 0 = 1{y (i) = 0}x (i) / 1{y (i) = 0} µ 1 = 1{y (i) = 1}x (i) / 1{y (i) = 1} Σ = 1 m Proof (see Problem Set 2) (x (i) µ y (i))(x (i) µ y (i)) T 36 / 108

37 Gaussian Discriminant Analysis (Contd.) Given a test data sample x, we can calculate p(y = 1 x) = = = p(x y = 1)p(y = 1) p(x) p(x y = 1)p(y = 1) p(x y = 1)p(y = 1) + p(x y = 0)p(y = 0) p(x y=0)p(y=0) p(x y=1)p(y=1) 37 / 108

38 p(x y = 0)p(y = 0) p(x y = 1)p(y = 1) ( = exp 1 2 (x µ 0) T Σ 1 (x µ 0 ) + 1 ) 2 (x µ 1) T Σ 1 (x µ 1 ) 1 ψ ψ ( = exp (µ 0 µ 1 ) T Σ 1 x + 1 ( µ T 2 1 Σ 1 µ 1 µ T 0 Σ 1 ) ) ( ( )) 1 ψ µ 0 exp log ψ ( = exp (µ 0 µ 1 ) T Σ 1 x + 1 ( µ T 2 1 Σ 1 µ 1 µ T 0 Σ 1 ) ) ( ( )) 1 ψ µ 0 exp log ψ ( = exp (µ 0 µ 1 ) T Σ 1 x + 1 ( )) ( µ T 2 1 Σ 1 µ 1 µ T 0 Σ 1 ) 1 ψ µ 0 + log ψ 38 / 108

39 Assume We have x := θ = [ ] x 1 [ 1 2 (µ 0 µ 1 ) T Σ 1 ( ) ( µ T 1 Σ 1 µ 1 µ T 0 Σ 1 µ 0 + log 1 ψ ψ ] ) p(x y = 0)p(y = 0) p(x y = 1)p(y = 1) ( = exp (µ 0 µ 1 ) T Σ 1 x + 1 ( )) ( µ T 2 1 Σ 1 µ 1 µ T 0 Σ 1 ) 1 ψ µ 0 + log ψ = exp ( θ T x ) Then p(y = 1 x) = p(x y=0)p(y=0) p(x y=1)p(y=1) = exp(θ T x) 39 / 108

40 Similarly, we have = = = = p(y = 0 x) p(x y = 0)p(y = 0) p(x) p(x y = 0)p(y = 0) p(x y = 1)p(y = 1) + p(x y = 0)p(y = 0) p(x y=1)p(y=1) p(x y=0)p(y=0) 1 ( ( )) 1 + exp (µ 1 µ 0 ) T Σ 1 x + µt 0 Σ 1 µ 0 µ T 1 Σ 1 µ log ψ 1 ψ 40 / 108

41 GDA and Logistic Regression GDA can model be reformulated as logistic regression Which one is better? GDA makes stronger modeling assumptions, and is more data efficient (i.e., requires less training data to learn well ) when the modeling assumptions are correct or at least approximately correct Logistic regression makes weaker assumptions, and is significantly more robust deviations from modeling assumptions When the data is indeed non-gaussian, then in the limit of large datasets, logistic regression will almost always do better than GDA In practice, logistic regression is used more often than GDA 41 / 108

42 Gaussian Discriminant Analysis (Contd.) 42 / 108

43 Naive Bayes Training data (x (i), y (i) ),,m x (i) is a n-dimensional vector Each feature x (i) j {0, 1} (j = 1,, n) y (i) {1,, k} The features and labels can be represented by random variables {X j } j=1,,n and Y, respectively 43 / 108

44 Naive Bayes (Contd ) For j j, Naive Bayes assumes X j and X j are conditionally independent given Y = = P (X 1 = x 1, X 2 = x 2,, X n = x n Y = y) n P (X j = x j X 1 = x 1, X 2 = x 2,, X j 1 = x j 1, Y = y) j=1 n P (X j = x j Y = y) j=1 44 / 108

45 Naive Bayes (Contd.) The key assumption in NB model P (Y = y, X 1 = x 1,, X n = x n ) = P (X 1 = x 1,, X n = x n Y = y)p (Y = y) n = P (Y = y) P (X j = x j Y = y) j=1 45 / 108

46 The key assumption in NB model P (Y = y, X 1 = x 1,, X n = x n ) = P (Y = y) Two sets of parameters (denoted by Ω) Probability mass function of Y = p(y) n P (X j = x j Y = y) j=1 n p j (x j y) j=1 p(y) = P (Y = y) where y {1, 2,, k} Conditional probability mass function of X j (j {1, 2,, n}) given Y = y (y {1,, k}) p j(x y) = P (X j = x Y = y) where x j {0, 1} 46 / 108

47 Maximum-Likelihood Estimates for Naive Bayes Log-likelihood function is l(ω) = log = = = m p(x (i), y (i) ) log p(x (i), y (i) ) n log p(y (i) ) p j (x (i) j y (i) ) j=1 n log p(y (i) ) + log p j (x (i) j y (i) ) j=1 47 / 108

48 MLE for Naive Bayes (Contd.) Assumption: The number of training data whose label is y count(y) = 1(y (i) = y), y = 1, 2,, k The number of training data with the j-th feature being x and the label being y count j(x y) = 1(y (i) = y x (i) j = x), y = 1, 2,, k, x = 0, 1 48 / 108

49 = log p(y (i) ) log p(y = 1) + + log p(y = k) i:y (i) =1 i:y (i) =k = count(y = 1) log p(y = 1) + + count(y = k) log p(y = k) k = count(y) log p(y) y=1 49 / 108

50 = n log p j (x (i) j y (i) ) j=1 n j=1 i:y (i) =1 log p j (x (i) j y = 1) + + n j=1 i:y (i) =k log p j (x (i) j y = k) 50 / 108

51 D (j) x y = {i : x(i) j = x y (i) = y} Then, we have = = n j=1 n j=1 i:y (i) =1 j=1 log p j (x (i) j y (i) ) i D (j) 0 1 log p j (x (i) j y = 1) + + n j=1 i:y=k i D (j) 1 1 log p j (x (i) j y = k) n log p j (x = 0 y = 1) + log p j (x = 1 y = 1). n + j=1 i D (j) 0 k log p j (x = 0 y = k) + i D (j) 1 k log p j (x = 1 y = k) 51 / 108

52 D (j) x y = {i : x(i) j = x y (i) = y} Then, we have = = n j=1 j=1 log p j (x (i) j y (i) ) n log p j (x = 0 y = 1) + log p j (x = 1 y = 1) j=1 i D (j) 0 1. n + n k i D (j) 0 k j=1 y=1 x {0,1} i D (j) 1 1 log p j (x = 0 y = k) + count j (x y) log p j (x y) i D (j) 1 k log p j (x = 1 y = k) 52 / 108

53 MLE for Naive Bayes (Contd.) Revisiting the log-likelihood function l(ω) = = n log p(y (i) ) + log p j (x (i) j y (i) ) j=1 k count(y) log p(y) + y=1 n k j=1 y=1 x {0,1} count j (x y) log p j (x y) 53 / 108

54 MLE for Naive Bayes (Contd.) Assume a training set {(x (i), y (i) )},,m. The maximum-likelihood estimates are then the parameter values p(y) and p j (x j y) that maximize l(ω) = k count(y) log p(y) + y=1 subject to the following constraints: p(y) 0 for y {1, 2,, k} n k j=1 y=1 x {0,1} count j (x y) log p j (x y) k y=1 p(y) = 1 For y {1,, k}, j {1,, n}, x {0, 1}, p j(x y) 0 For y {1,, k}, j {1,, n}, x {0,1} pj(x y) = 1 54 / 108

55 MLE for Naive Bayes (Contd.) Theorem 1 The maximum-likelihood estimates for Naive Bayes model are as follows and p(y) = count(y) m p j (x y) = count j(x y) count(y) Proof (see Problem Set 2) m = 1(y(i) = y) m = m 1(y(i) = y x (i) j = x) m 1(y(i) = y) 55 / 108

56 MLE for Naive Bayes (Contd.) What if x {1, 2,, u} instead of x {0, 1}? Can we get the same results? Check it yourself! 56 / 108

57 Laplace Smoothing There may exist some feature, e.g., X j, such that X j = x may never happen in the training data p j (x y) = 0 p(y x 1,, x n ) = 0 This is unreasonable!!! How can we resolve this problem? Laplace smoothing: m p(y) = 1(y(i) = y) + 1 m + k m p j(x y) = 1(y(i) = y x (i) j = x) + 1 m 1(y(i) = y) + k 57 / 108

58 Classification by Naive Bayes Given a test sample x = [x 1, x 2,, x 3 ] T, we have P (Y = y X 1 = x 1,, X n = x n ) = P (X 1 = x 1,, X n = x n Y = y)p (Y = y) P (X 1 = x 1,, X n = x n ) = P (Y = y) n j=1 p(x j = x j Y = y) P (X 1 = x 1,, X n = x n ) = p(y) n j=1 p j(x j y) p(x 1,, x n ) Therefore, the output of the Naive Bayes model is n arg max p(y) p j (x j y) y {1,,k} j=1 58 / 108

59 Classification by Naive Bayes Example: y = 0, 1, 2, 3 P (Y = 0 X 1 = x 1,, X n = x n ) = p(y = 0) n j=1 p j(x j y = 0) p(x 1,, x n ) P (Y = 1 X 1 = x 1,, X n = x n ) = p(y = 1) n j=1 p j(x j y = 1) p(x 1,, x n ) P (Y = 2 X 1 = x 1,, X n = x n ) = p(y = 2) n j=1 p j(x j y = 2) p(x 1,, x n ) P (Y = 3 X 1 = x 1,, X n = x n ) = p(y = 3) n j=1 p j(x j y = 3) p(x 1,, x n ) 59 / 108

60 Naive Bayes for Multinomial Distribution Assumptions: Each training sample involves a different number of features x (i) = [x (i) 1, x(i) 2,, x(i) n i ] T The j-th feature of x (i) takes a finite set of values x (i) j {1, 2,, v}, for j = 1,, n For each training data, the features are i.i.d. P (X j = t Y = y) = p(t y), for j = 1,, n p(t y) 0 is the conditional probability mass function v t=1 p(t y) = 1 60 / 108

61 Naive Bayes for Multinomial Distribution (Contd.) Assumptions: Each training sample involves a different number of features x (i) = [x (i) 1, x(i) 2,, x(i) n i ] T The j-th feature of x (i) takes a finite set of values, x (i) j {1, 2,, v} For each training data (x, y) where x is a n-dimensional vector P (Y = y) = p(y) P (X 1 = x 1, X 2 = x 2,, X n = x n Y = y) = n P (X j = x j Y = y) j=1 61 / 108

62 Naive Bayes for Multinomial Distribution (Contd.) = = = = where P (X 1 = x 1, X 2 = x 2,, X n = x n Y = y) n P (X j = x j Y = y) j=1 v P (X j = t Y = y) t=1 j:x j=t v p(t y) t=1 j:x j=t v p(t y) count(t) t=1 n count(t) = 1(x j = t) j=1 62 / 108

63 Naive Bayes for Multinomial Distribution (Contd.) For each training data (x, y) P (Y = y) = p(y) P (X 1 = x 1, X 2 = x 2,, X n = x n Y = y) = Parameters Ω p(y), y = 1, 2,, k v p(t y) count(t) t=1 p(t y), t = 1, 2,, v, y = 1, 2,, k 63 / 108

64 Naive Bayes for Multinomial Distribution (Contd.) Log-likelihood function l(ω) = log = log = m p(x (i), y (i) ) m y=1 k y=1 k 1(y (i) = y)p(x (i) y (i) = y)p(y (i) = y) ( ) 1(y (i) = y) log p(x (i) y (i) = y)p(y (i) = y) 64 / 108

65 Naive Bayes for Multinomial Distribution (Contd.) l(ω) = = = = where k y=1 y=1 y=1 y=1 ( ) 1(y (i) = y) log p(x (i) y (i) = y)p(y (i) = y) k n i 1(y (i) = y) log p(y (i) = y) j=1 p(x (i) j y (i) = y) ( ) k v 1(y (i) = y) log p(y) p(t y) count(i) (t) t=1 ( k 1(y (i) = y) log p(y) + n i count (i) (t) = 1(x (i) j = t) j=1 ) v count (i) (t)p(t y) t=1 65 / 108

66 Naive Bayes for Multinomial Distribution (Contd.) Problem formulation ( ) k v max l(ω) = 1(y (i) = y) log p(y) + count (i) (t)p(t y) y=1 t=1 s.t. p(y) 0, y = 1,, k p(t y) 0, t = 1,, v y = 1,, k k p(y) = 1, y=1 v p(t y) = 1, y = 1, 2,, k t=1 66 / 108

67 Naive Bayes for Multinomial Distribution (Contd.) Solution m p(t y) = 1(y(i) = y)count (i) (t) m 1(y(i) = y) v t=1 count(i) (t) m p(y) = 1(y(i) = y) m Check them by yourselves! 67 / 108

68 Naive Bayes for Multinomial Distribution (Contd.) Laplace smoothing m ψ(t y) = 1(y(i) = y)count (i) (t) m 1(y(i) = y) v t=1 count(i) (t) + v m ψ(y) = 1(y(i) = y) m + k 68 / 108

69 Convex Functions A set C is convex if the line segment between any two points in C lies in C, i.e., for x 1, x 2 C and θ with 0 θ 1, we have θx 1 + (1 θ)x 2 C A function f : R n R is convex, if domf is a convex set and if for all x, y domf and λ with 0 λ 1, we have f(λx + (1 λ)y) λf(x) + (1 λ)f(y) 69 / 108

70 Convex Functions (Contd.) First-order conditions: Suppose f is differentiable (i.e., its gradient f exists at each point in domf, which is open). Then, f is convex if and only if domf is convex and holds for x, y domf f(y) f(x) + f(x) T (y x) 70 / 108

71 Convex Functions (Contd.) Second-order conditions: Assume f is twice differentiable (i.t., its Hessian matrix or second derivative 2 f exists at each point in domf, which is open), then f is convex if and only if domf is convex and its Hessian is positive semidefinite: for x domf, 2 f 0 71 / 108

72 Jensen s Inequality Let f be a convex function, then where λ [0, 1] f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ) Jensen s Inequality Let f(x) be a convex function defined on an interval I. If x 1, x 2,, x N I and λ 1, λ 2,, λ N 0 with N λi = 1 N N f( λ ix i) λ if(x i) 72 / 108

73 The Proof of Jensen s Inequality When N = 1, the result is trivial When N = 2, f(λ 1 x 1 + λ 2 x 2 ) λ 1 f(x 1 ) + λ 2 f(x 2 ) due to convexity of f(x) When N 3, the proof is by induction We assume that, the Jensen s inequality holds when N = k 1. k 1 λ ix i) λ if(x i) k 1 f( We then prove that, the Jensen s inequality still holds for N = k 73 / 108

74 When N = k, f( k k 1 λ i x i ) = f( λ i x i + λ k x k ) k 1 λ i = f((1 λ k ) x i + λ k x k ) 1 λ k k 1 λ i (1 λ k )f( x i ) + λ k f(x k ) 1 λ k k 1 λ i (1 λ k ) f(x i ) + λ k f(x k ) 1 λ k = = k 1 λ i f(x i ) + λ k f(x k ) k λ i f(x i ) 74 / 108

75 The Probabilistic Form of Jensen s Inequality The inequality can be extended to infinite sums, integrals, and expected values If p(x) 0 on S domf and p(x)dx = 1, we have S f( p(x)xdx) = p(x)f(x)dx S S Assuming X is a random variable and P is a probability distribution on sample space S, we have The equality holds if X is a constant f[e(x)] E[f(X)] 75 / 108

76 Jensen s inequality for Concave Function Assume f be a concave function The probabilistic form N N f( λ i x i ) λ i f(x i ) Example: f(x) = log x log( N λixi) N λi log(xi) log(e[x]) E[log(X)] f(e[x]) E[f(X)] 76 / 108

77 The Expectation-Maximization (EM) Algorithm A training set {x (1), x (2),, x (m) } (without labels) The log-likelihood function m l(θ) = log p(x (i) ; θ) = log p(x (i), z (i) ; θ) z (i) Ω θ denotes the full set of unknown parameters in the model z (i) Ω is so-called latent variable 77 / 108

78 The EM Algorithm (Contd ) Our goal is to maximize the log-likelihood function l(θ) = log p(x (i), z (i) ; θ) z (i) Ω The basic idea of EM algorithm Repeatedly construct a lower-bound on l (E-step) Then optimize that lower-bound (M-step) 78 / 108

79 The EM Algorithm (Contd ) A training set {x (1), x (2),, x (m) } (without labels) The log-likelihood function l(θ) = log p(x (i) ; θ) = log p(x (i), z (i) ; θ) z (i) Ω θ denotes the full set of unknown parameters in the model z (i) Ω s are latent variables Then, our goal is to maximize the above log-likelihood function The basic idea of EM algorithm: Repeatedly construct a lower-bound on l (E-step), and then optimize that lower-bound (M-step) 79 / 108

80 For each i {1, 2,, m}, let Z i Ω be a random variable indicating the label of the i-th training data The probability that the label of the i-th training data is z: P (Z i = z) = Q i (z) We say Q i be some distribution over the z s Q i (z) = 1, Q i (z) 0 We have z Ω l(θ) = = log p(x (i), z (i) ; θ) z (i) Ω log z (i) Ω Q i (z (i) ) p(x(i), z (i) ; θ) Q i (z (i) ) 80 / 108

81 Assume φ : Ω R Suppose E[φ(Z i )] = z (i) Ω = Re-visit the log-likelihood function z (i) Ω P (Z i = z (i) )φ(z (i) ) Q i (z (i) )φ(z (i) ) φ(z (i) ) = p(x(i), z (i) ; θ) Q i (z (i) ) l(θ) = = log z (i) Ω log E[φ(Z i )] Q i (z (i) ) p(x(i), z (i) ; θ) Q i (z (i) ) 81 / 108

82 Since log( ) is a concave function, according to Jensen s inequality, we have Then, the log-likelihood function log(e[φ(z i )]) E[log(φ(Z i ))] l(θ) = log E[φ(Z i )] E[log(φ(Z i )] = z (i) Ω = z (i) Ω Q i (z (i) ) log φ(z (i) ) Q i (z (i) ) log p(x(i), z (i) ; θ) Q i (z (i) ) 82 / 108

83 For any set of distributions Q i, l(θ) has a lower bound l(θ) z (i) Ω Q i (z (i) ) log p(x(i), z (i) ; θ) Q i (z (i) ) Tighten the lower bound (i.e., let the equality hold) The equality in the Jensen s inequality holds if where c is a constant p(x (i), z (i) ; θ) Q i(z (i) ) = c 83 / 108

84 Tighten the lower bound (i.e., let the equality hold) The equality in the Jensen s inequality holds if where c is a constant Since we have z (i) Ω p(x (i), z (i) ; θ) Q i(z (i) ) z (i) Ω = c Q i(z (i) ) = 1 p(x (i), z (i) ; θ) = c z (i) Ω Q i(z) = c 84 / 108

85 Tighten the lower bound (i.e., let the equality hold) We have p(x (i), z (i) ; θ)/q i(z (i) ) = c z (i) Ω Qi(z) = 1 z (i) Ω p(x(i), z (i) ; θ) = c Therefore, Q i(z (i) ) = p(x(i), z (i) ; θ) c p(x (i), z (i) ; θ) = z (i) Ω p(x(i), z (i) ; θ) = p(x(i), z (i) ; θ) p(x (i) ; θ) = p(z (i) x (i) ; θ) 85 / 108

86 The EM Algorithm (Contd ) Repeat the following step until convergence (E-step) For each i, set (M-step) set θ := arg max θ Q i(z (i) ) := p(z (i) x (i) ; θ) i z (i) Ω Q i(z (i) ) log p(x(i), z (i) ; θ) Q i(z (i) ) 86 / 108

87 In the end of the t-th iteration, l(θ [t] ) = Convergence Q [t] z (i) i (z(i) ) log p(x(i), z (i) ; θ [t] ) Q [t] i (z(i) ) θ (t+1) is then obtained by maximizing the right hand side of the above equation l(θ [t+1] ) z (i) Ω z (i) Ω = l(θ [t] ) Q [t] Q [t] i (z(i) ) log p(x(i), z (i) ; θ [t+1] ) Q [t] i (z(i) ) i (z(i) ) log p(x(i), z (i) ; θ [t] ) Q [t] i (z(i) ) EM causes the likelihood to converge monotonically 87 / 108

88 Reviewing Mixtures of Gaussians A training set {x (1),, x (m) } Mixture of Gaussians model p(x, z) = p(x z)p(z) Z {1,, k} Multinomial({φ j} j=1,,k ) φ j = P (Z = j) (φ j 0 and k j=1 φj = 1) x z = j N (µ j, Σ j) z s are so-called latent random variables, since they are hidden/unobserved 88 / 108

89 Reviewing Mixtures of Gaussians (Contd.) The log-likelihood function l(φ, µ, Σ) = = log p(x (i) ; φ, µ, Σ) log z (i) =1 k p(x (i) z (i) ; µ, Σ)p(z (i) ; φ) 89 / 108

90 Applying EM Algorithm to Mixtures of Gaussians Repeat the following steps until convergence (E-step) For each i, j, set w (i) j = p(z (i) = j x (i) ; φ, µ, Σ) = p(x (i) z (i) = j; µ, Σ)p(z (i) = j; φ) k l=1 p(x(i) z (i) = l; µ, Σ)p(z (i) = l; φ) (M-step) Update the parameters φ j = 1 m µ j = Σ j = w (i) j m w(i) j x(i) m w(i) j m w(i) j (x(i) µ j)(x (i) µ j) T m w(i) j 90 / 108

91 Applying EM Algorithm to MG (Contd.) (E-step) For each i, j, set w (i) j = Q i (z (i) = j) = p(z (i) = j x (i) ; φ, µ, Σ) = p(x (i) z (i) = j; µ, Σ)p(z (i) = j; φ) k l=1 p(x(i) z (i) = l; µ, Σ)p(z (i) = l; φ) p(x (i) z (i) = j; µ j, Σ j) = p(z (i) = j; φ) = φ j 1 exp ( 1 (x (2π) n/2 Σ 1/2 2 µj)t Σ 1 j (x µ ) j) 91 / 108

92 Applying EM Algorithm to MG (Contd.) (M-step) Maximizing = = = Q i (z (i) ) log p(x(i), z(i) ; φ, µ, Σ) Q i (z (i) ) z (i) k Q i (z (i) = j) log p(x(i) z (i) = j; µ, Σ)p(z (i) = j; φ) Q i (z (i) = j) j=1 k j=1 + j=1 ω (i) j log k k j=1 1 (2π) n/2 Σ j exp ( 1 1/2 2 (x(i) µ j ) T Σ 1 j ω (i) j (x (i) µ j ) ) φ j [ ( ) ω (i) j log (2π) n 2 Σj (x(i) µ j ) T Σ 1 j (x (i) µ j ) ω (i) j log φ j k j=1 ω (i) j log ω (i) j ] 92 / 108

93 Applying EM Algorithm to MG: Calculating µ µl m = µl m = 1 2 = k j=1 k j=1 ω (i) j log 1 (2π) n/2 Σ j exp ( 1 1/2 2 (x(j) µ j ) T Σ 1 j ω (i) 1 j 2 (x(i) µ j ) T Σ 1 j (x (i) µ j ) ( ) ω (i) l µl 2µ T l Σ 1 l x (i) µ T l Σ 1 l µ l ω (i) l ( Σ 1 l ) x (i) Σ 1 l µ l ω (i) j (x (j) µ j ) ) φ j Let m ω(i) l ( Σ 1 l x (i) Σ 1 l µ l ) = 0, we have µ l = m ω(i) l x (i) m ω(i) l 93 / 108

94 Applying EM Algorithm to MG: Calculating φ Our problem becomes max s.t. k j=1 k φ j = 1 j=1 ω (i) j log φ j Using the theory of Lagrange multiplier φ j = 1 m ω (i) j 94 / 108

95 Applying EM Algorithm to MG: Calculating Σ Minimizing = k j=1 n log 2π [ ( ) ω (i) j log (2π) n 2 Σj ] 2 (x(i) µ j ) T Σ 1 j (x (i) µ j ) k j=1 k j=1 ω (i) j k j=1 ω (i) j log Σ j ω (i) j (x (i) µ j ) T Σ 1 j (x (i) µ j ) 95 / 108

96 Minimizing 1 2 k j=1 ω (i) j log Σ j k j=1 Therefore, we have k Σj ω (i) j log Σ j + j=1 for j {1,, k} k j=1 ω (i) j (x (i) µ j ) T Σ 1 j (x (i) µ j ) ω (i) j (x (i) µ j ) T Σ 1 j (x (i) µ j ) = 0 96 / 108

97 By applying X tr(ax 1 B) = (X 1 BAX 1 ) T A A = A (A 1 ) T We get a solution Σ j = where j = 1,, k Check the derivations by yourself! m ω(i) j (x (i) µ j )(x (i) µ j ) T m ω(i) j 97 / 108

98 Naive Bayes with Missing Labels For any x, we have p(x) = k p(x, y) = y=1 k n p(y) p j (x j y) The log-likelihood function is then defined as k n l(θ) = log p(x (i) ) = log p(y) p j (x (i) j y) Maximizing l(θ) subject to the following constraints p(y) 0 for y {1,, k}, and k y=1 p(y) = 1 For y {1,, k}, j {1,, n}, x j {0, 1}, p j(x j y) 0 For y {1,, k} and j {1,, n}, x j {0,1} pj(xj y) = 1 y=1 y=1 j=1 j=1 98 / 108

99 Naive Bayes with Missing Labels (Contd.) When labels are given n l(θ) = log p(y (i) ) p j (x (i) j y (i) ) j=1 When labels are missed k n l(θ) = log p(y) p j (x (i) j y) y=1 j=1 99 / 108

100 Applying EM Algorithm to Naive Bayes Repeat the following steps until convergence (E-step) For each i = 1,, m and y = 1,, k set p(y) n δ(y i) = p(y x (i) j=1 ) = pj(x(i) j y) k y =1 p(y ) n j=1 pj(x(i) j y ) (M-step) Update the parameters p(y) = 1 m p j(x y) = δ(y i) δ(y i) m δ(y i) 100 / 108

101 Applying EM Algorithm to NB: E-Step δ(y i) = p(y x (i) ) = p(x(i) y)p(y) p(x (i) ) = p(y) n j=1 p j(x (i) j y) k y =1 p(x(i), y ) = p(y) n j=1 p j(x (i) j y) k y =1 p(x(i) y )p(y ) = p(y) n j=1 p j(x (i) j y) k y =1 p(y ) n j=1 p j(x (i) j y ) 101 / 108

102 Applying EM Algorithm to NB: M-Step = = = k y=1 k y=1 k y=1 y=1 δ(y i) log p(x(i), z (i) = y) δ(y i) δ(y i) log p(x(i) z (i) = y)p(z (i) = y) δ(y i) δ(y i) log p(y) n j=1 p j(x (i) j y) δ(y i) k δ(y i) log p(y) + n log p j (x (i) j y) log δ(y i) j=1 102 / 108

103 Applying EM Algorithm to NB: M-Step (Contd ) = = y=1 y=1 y=1 k δ(y i) log p(x(i), z (i) = y) δ(y i) k n δ(y i) log p(y) + log p j (x (i) j y) log δ(y i) j=1 k δ(y i) log p(y) + y=1 k δ(y i) log δ(y i) k n y=1 j=1 δ(y i) log p j (x (i) j y) 103 / 108

104 Applying EM Algorithm to NB: M-Step (Contd ) = = k n y=1 j=1 δ(y i) log p j (x (i) j y) k n δ(y i) log p j (x = 0 y) y=1 j=1 i:x (i) j =0 k n + δ(y i) log p j (x = 1 y) y=1 j=1 i:x (i) j =1 k n δ(y i) log p j (x y) y=1 j=1 x {0,1} i:x (i) j =x 104 / 108

105 Applying EM Algorithm to NB: M-Step (Contd ) max s.t. y=1 k δ(y i) log p(y) + k p(y) = 1 y=1 x {0,1} k n y=1 j=1 x {0,1} i:x (i) j =x δ(y i) p j (x y) = 1, y = 1,, k, j = 1,, n p(y) 0, y = 1,, k p j (x y) 0, j = 1,, n, x {0, 1}, y = 1,, k log p j (x y) 105 / 108

106 Applying EM Algorithm to NB: M-Step (Contd ) Problem I max s.t. k δ(y i) log p(y) y=1 k p(y) = 1 y=1 p(y) 0, y = 1,, k Solution (by Lagrange multiplier): p(y) = m m δ(y i) k y=1 δ(y i) = 1 m δ(y i) 106 / 108

107 Applying EM Algorithm to NB: M-Step (Contd ) Problem II max s.t. k n y=1 j=1 x {0,1} x {0,1} i:x (i) j =x δ(y i) log p j (x y) p j (x y) = 1, y = 1,, k, j = 1,, n p j (x y) 0, j = 1,, n, x {0, 1}, y = 1,, k Solution (by Lagrange multiplier): i:x p j (x y) = (i) =x δ(y i) x {0,1} i:x (i) j =x δ(y i) = i:x (i) =x m δ(y i) δ(y i) 107 / 108

108 Thanks! Q & A 108 / 108

Other Test Constructions: Likelihood Ratio & Bayes Tests

Other Test Constructions: Likelihood Ratio & Bayes Tests Other Test Constructions: Likelihood Ratio & Bayes Tests Side-Note: So far we have seen a few approaches for creating tests such as Neyman-Pearson Lemma ( most powerful tests of H 0 : θ = θ 0 vs H 1 :

Διαβάστε περισσότερα

Solution Series 9. i=1 x i and i=1 x i.

Solution Series 9. i=1 x i and i=1 x i. Lecturer: Prof. Dr. Mete SONER Coordinator: Yilin WANG Solution Series 9 Q1. Let α, β >, the p.d.f. of a beta distribution with parameters α and β is { Γ(α+β) Γ(α)Γ(β) f(x α, β) xα 1 (1 x) β 1 for < x

Διαβάστε περισσότερα

Statistical Inference I Locally most powerful tests

Statistical Inference I Locally most powerful tests Statistical Inference I Locally most powerful tests Shirsendu Mukherjee Department of Statistics, Asutosh College, Kolkata, India. shirsendu st@yahoo.co.in So far we have treated the testing of one-sided

Διαβάστε περισσότερα

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science. Bayesian statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist

Διαβάστε περισσότερα

lecture 10: the em algorithm (contd)

lecture 10: the em algorithm (contd) lecture 10: the em algorithm (contd) STAT 545: Intro. to Computational Statistics Vinayak Rao Purdue University September 24, 2018 Exponential family models Consider a space X. E.g. R, R d or N. ϕ(x) =

Διαβάστε περισσότερα

Problem Set 3: Solutions

Problem Set 3: Solutions CMPSCI 69GG Applied Information Theory Fall 006 Problem Set 3: Solutions. [Cover and Thomas 7.] a Define the following notation, C I p xx; Y max X; Y C I p xx; Ỹ max I X; Ỹ We would like to show that C

Διαβάστε περισσότερα

C.S. 430 Assignment 6, Sample Solutions

C.S. 430 Assignment 6, Sample Solutions C.S. 430 Assignment 6, Sample Solutions Paul Liu November 15, 2007 Note that these are sample solutions only; in many cases there were many acceptable answers. 1 Reynolds Problem 10.1 1.1 Normal-order

Διαβάστε περισσότερα

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Chapter 6: Systems of Linear Differential. be continuous functions on the interval Chapter 6: Systems of Linear Differential Equations Let a (t), a 2 (t),..., a nn (t), b (t), b 2 (t),..., b n (t) be continuous functions on the interval I. The system of n first-order differential equations

Διαβάστε περισσότερα

ST5224: Advanced Statistical Theory II

ST5224: Advanced Statistical Theory II ST5224: Advanced Statistical Theory II 2014/2015: Semester II Tutorial 7 1. Let X be a sample from a population P and consider testing hypotheses H 0 : P = P 0 versus H 1 : P = P 1, where P j is a known

Διαβάστε περισσότερα

Homework 8 Model Solution Section

Homework 8 Model Solution Section MATH 004 Homework Solution Homework 8 Model Solution Section 14.5 14.6. 14.5. Use the Chain Rule to find dz where z cosx + 4y), x 5t 4, y 1 t. dz dx + dy y sinx + 4y)0t + 4) sinx + 4y) 1t ) 0t + 4t ) sinx

Διαβάστε περισσότερα

Reminders: linear functions

Reminders: linear functions Reminders: linear functions Let U and V be vector spaces over the same field F. Definition A function f : U V is linear if for every u 1, u 2 U, f (u 1 + u 2 ) = f (u 1 ) + f (u 2 ), and for every u U

Διαβάστε περισσότερα

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review Harvard College Statistics 104: Quantitative Methods for Economics Formula and Theorem Review Tommy MacWilliam, 13 tmacwilliam@college.harvard.edu March 10, 2011 Contents 1 Introduction to Data 5 1.1 Sample

Διαβάστε περισσότερα

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit Ting Zhang Stanford May 11, 2001 Stanford, 5/11/2001 1 Outline Ordinal Classification Ordinal Addition Ordinal Multiplication Ordinal

Διαβάστε περισσότερα

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM Solutions to Question 1 a) The cumulative distribution function of T conditional on N n is Pr T t N n) Pr max X 1,..., X N ) t N n) Pr max

Διαβάστε περισσότερα

EE512: Error Control Coding

EE512: Error Control Coding EE512: Error Control Coding Solution for Assignment on Finite Fields February 16, 2007 1. (a) Addition and Multiplication tables for GF (5) and GF (7) are shown in Tables 1 and 2. + 0 1 2 3 4 0 0 1 2 3

Διαβάστε περισσότερα

Solutions to Exercise Sheet 5

Solutions to Exercise Sheet 5 Solutions to Eercise Sheet 5 jacques@ucsd.edu. Let X and Y be random variables with joint pdf f(, y) = 3y( + y) where and y. Determine each of the following probabilities. Solutions. a. P (X ). b. P (X

Διαβάστε περισσότερα

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Partial Differential Equations in Biology The boundary element method. March 26, 2013 The boundary element method March 26, 203 Introduction and notation The problem: u = f in D R d u = ϕ in Γ D u n = g on Γ N, where D = Γ D Γ N, Γ D Γ N = (possibly, Γ D = [Neumann problem] or Γ N = [Dirichlet

Διαβάστε περισσότερα

2 Composition. Invertible Mappings

2 Composition. Invertible Mappings Arkansas Tech University MATH 4033: Elementary Modern Algebra Dr. Marcel B. Finan Composition. Invertible Mappings In this section we discuss two procedures for creating new mappings from old ones, namely,

Διαβάστε περισσότερα

ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13

ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13 ENGR 69/69 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework : Bayesian Decision Theory (solutions) Due: Septemer 3 Prolem : ( pts) Let the conditional densities for a two-category one-dimensional

Διαβάστε περισσότερα

5.4 The Poisson Distribution.

5.4 The Poisson Distribution. The worst thing you can do about a situation is nothing. Sr. O Shea Jackson 5.4 The Poisson Distribution. Description of the Poisson Distribution Discrete probability distribution. The random variable

Διαβάστε περισσότερα

Every set of first-order formulas is equivalent to an independent set

Every set of first-order formulas is equivalent to an independent set Every set of first-order formulas is equivalent to an independent set May 6, 2008 Abstract A set of first-order formulas, whatever the cardinality of the set of symbols, is equivalent to an independent

Διαβάστε περισσότερα

Example Sheet 3 Solutions

Example Sheet 3 Solutions Example Sheet 3 Solutions. i Regular Sturm-Liouville. ii Singular Sturm-Liouville mixed boundary conditions. iii Not Sturm-Liouville ODE is not in Sturm-Liouville form. iv Regular Sturm-Liouville note

Διαβάστε περισσότερα

Μηχανική Μάθηση Hypothesis Testing

Μηχανική Μάθηση Hypothesis Testing ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ Μηχανική Μάθηση Hypothesis Testing Γιώργος Μπορμπουδάκης Τμήμα Επιστήμης Υπολογιστών Procedure 1. Form the null (H 0 ) and alternative (H 1 ) hypothesis 2. Consider

Διαβάστε περισσότερα

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X. Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequalit for metrics: Let (X, d) be a metric space and let x,, z X. Prove that d(x, z) d(, z) d(x, ). (ii): Reverse triangle inequalit for norms:

Διαβάστε περισσότερα

Lecture 34 Bootstrap confidence intervals

Lecture 34 Bootstrap confidence intervals Lecture 34 Bootstrap confidence intervals Confidence Intervals θ: an unknown parameter of interest We want to find limits θ and θ such that Gt = P nˆθ θ t If G 1 1 α is known, then P θ θ = P θ θ = 1 α

Διαβάστε περισσότερα

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions SCHOOL OF MATHEMATICAL SCIENCES GLMA Linear Mathematics 00- Examination Solutions. (a) i. ( + 5i)( i) = (6 + 5) + (5 )i = + i. Real part is, imaginary part is. (b) ii. + 5i i ( + 5i)( + i) = ( i)( + i)

Διαβάστε περισσότερα

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8  questions or comments to Dan Fetter 1 Eon : Fall 8 Suggested Solutions to Problem Set 8 Email questions or omments to Dan Fetter Problem. Let X be a salar with density f(x, θ) (θx + θ) [ x ] with θ. (a) Find the most powerful level α test

Διαβάστε περισσότερα

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Chapter 6: Systems of Linear Differential. be continuous functions on the interval Chapter 6: Systems of Linear Differential Equations Let a (t), a 2 (t),..., a nn (t), b (t), b 2 (t),..., b n (t) be continuous functions on the interval I. The system of n first-order differential equations

Διαβάστε περισσότερα

6.3 Forecasting ARMA processes

6.3 Forecasting ARMA processes 122 CHAPTER 6. ARMA MODELS 6.3 Forecasting ARMA processes The purpose of forecasting is to predict future values of a TS based on the data collected to the present. In this section we will discuss a linear

Διαβάστε περισσότερα

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Information bound Lecturer: Yihong Wu Scribe: Shiyu Liang, Feb 6, 06 [Ed. Mar 9] Recall the Chi-squared divergence

Διαβάστε περισσότερα

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required) Phys460.nb 81 ψ n (t) is still the (same) eigenstate of H But for tdependent H. The answer is NO. 5.5.5. Solution for the tdependent Schrodinger s equation If we assume that at time t 0, the electron starts

Διαβάστε περισσότερα

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics Fourier Series MATH 211, Calculus II J. Robert Buchanan Department of Mathematics Spring 2018 Introduction Not all functions can be represented by Taylor series. f (k) (c) A Taylor series f (x) = (x c)

Διαβάστε περισσότερα

Probability and Random Processes (Part II)

Probability and Random Processes (Part II) Probability and Random Processes (Part II) 1. If the variance σ x of d(n) = x(n) x(n 1) is one-tenth the variance σ x of a stationary zero-mean discrete-time signal x(n), then the normalized autocorrelation

Διαβάστε περισσότερα

Matrices and Determinants

Matrices and Determinants Matrices and Determinants SUBJECTIVE PROBLEMS: Q 1. For what value of k do the following system of equations possess a non-trivial (i.e., not all zero) solution over the set of rationals Q? x + ky + 3z

Διαβάστε περισσότερα

Parametrized Surfaces

Parametrized Surfaces Parametrized Surfaces Recall from our unit on vector-valued functions at the beginning of the semester that an R 3 -valued function c(t) in one parameter is a mapping of the form c : I R 3 where I is some

Διαβάστε περισσότερα

Homework 3 Solutions

Homework 3 Solutions Homework 3 Solutions Igor Yanovsky (Math 151A TA) Problem 1: Compute the absolute error and relative error in approximations of p by p. (Use calculator!) a) p π, p 22/7; b) p π, p 3.141. Solution: For

Διαβάστε περισσότερα

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University Estimation for ARMA Processes with Stable Noise Matt Calder & Richard A. Davis Colorado State University rdavis@stat.colostate.edu 1 ARMA processes with stable noise Review of M-estimation Examples of

Διαβάστε περισσότερα

Introduction to the ML Estimation of ARMA processes

Introduction to the ML Estimation of ARMA processes Introduction to the ML Estimation of ARMA processes Eduardo Rossi University of Pavia October 2013 Rossi ARMA Estimation Financial Econometrics - 2013 1 / 1 We consider the AR(p) model: Y t = c + φ 1 Y

Διαβάστε περισσότερα

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS CHAPTER 5 SOLVING EQUATIONS BY ITERATIVE METHODS EXERCISE 104 Page 8 1. Find the positive root of the equation x + 3x 5 = 0, correct to 3 significant figures, using the method of bisection. Let f(x) =

Διαβάστε περισσότερα

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β 3.4 SUM AND DIFFERENCE FORMULAS Page Theorem cos(αβ cos α cos β -sin α cos(α-β cos α cos β sin α NOTE: cos(αβ cos α cos β cos(α-β cos α -cos β Proof of cos(α-β cos α cos β sin α Let s use a unit circle

Διαβάστε περισσότερα

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch: HOMEWORK 4 Problem a For the fast loading case, we want to derive the relationship between P zz and λ z. We know that the nominal stress is expressed as: P zz = ψ λ z where λ z = λ λ z. Therefore, applying

Διαβάστε περισσότερα

Approximation of distance between locations on earth given by latitude and longitude

Approximation of distance between locations on earth given by latitude and longitude Approximation of distance between locations on earth given by latitude and longitude Jan Behrens 2012-12-31 In this paper we shall provide a method to approximate distances between two points on earth

Διαβάστε περισσότερα

Fractional Colorings and Zykov Products of graphs

Fractional Colorings and Zykov Products of graphs Fractional Colorings and Zykov Products of graphs Who? Nichole Schimanski When? July 27, 2011 Graphs A graph, G, consists of a vertex set, V (G), and an edge set, E(G). V (G) is any finite set E(G) is

Διαβάστε περισσότερα

Congruence Classes of Invertible Matrices of Order 3 over F 2

Congruence Classes of Invertible Matrices of Order 3 over F 2 International Journal of Algebra, Vol. 8, 24, no. 5, 239-246 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.2988/ija.24.422 Congruence Classes of Invertible Matrices of Order 3 over F 2 Ligong An and

Διαβάστε περισσότερα

Exercises to Statistics of Material Fatigue No. 5

Exercises to Statistics of Material Fatigue No. 5 Prof. Dr. Christine Müller Dipl.-Math. Christoph Kustosz Eercises to Statistics of Material Fatigue No. 5 E. 9 (5 a Show, that a Fisher information matri for a two dimensional parameter θ (θ,θ 2 R 2, can

Διαβάστε περισσότερα

Math 6 SL Probability Distributions Practice Test Mark Scheme

Math 6 SL Probability Distributions Practice Test Mark Scheme Math 6 SL Probability Distributions Practice Test Mark Scheme. (a) Note: Award A for vertical line to right of mean, A for shading to right of their vertical line. AA N (b) evidence of recognizing symmetry

Διαβάστε περισσότερα

Numerical Analysis FMN011

Numerical Analysis FMN011 Numerical Analysis FMN011 Carmen Arévalo Lund University carmen@maths.lth.se Lecture 12 Periodic data A function g has period P if g(x + P ) = g(x) Model: Trigonometric polynomial of order M T M (x) =

Διαβάστε περισσότερα

Concrete Mathematics Exercises from 30 September 2016

Concrete Mathematics Exercises from 30 September 2016 Concrete Mathematics Exercises from 30 September 2016 Silvio Capobianco Exercise 1.7 Let H(n) = J(n + 1) J(n). Equation (1.8) tells us that H(2n) = 2, and H(2n+1) = J(2n+2) J(2n+1) = (2J(n+1) 1) (2J(n)+1)

Διαβάστε περισσότερα

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y Stat 50 Homework Solutions Spring 005. (a λ λ λ 44 (b trace( λ + λ + λ 0 (c V (e x e e λ e e λ e (λ e by definition, the eigenvector e has the properties e λ e and e e. (d λ e e + λ e e + λ e e 8 6 4 4

Διαβάστε περισσότερα

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R + Chapter 3. Fuzzy Arithmetic 3- Fuzzy arithmetic: ~Addition(+) and subtraction (-): Let A = [a and B = [b, b in R If x [a and y [b, b than x+y [a +b +b Symbolically,we write A(+)B = [a (+)[b, b = [a +b

Διαβάστε περισσότερα

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007 Οδηγίες: Να απαντηθούν όλες οι ερωτήσεις. Αν κάπου κάνετε κάποιες υποθέσεις να αναφερθούν στη σχετική ερώτηση. Όλα τα αρχεία που αναφέρονται στα προβλήματα βρίσκονται στον ίδιο φάκελο με το εκτελέσιμο

Διαβάστε περισσότερα

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3 Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3 1 State vector space and the dual space Space of wavefunctions The space of wavefunctions is the set of all

Διαβάστε περισσότερα

6. MAXIMUM LIKELIHOOD ESTIMATION

6. MAXIMUM LIKELIHOOD ESTIMATION 6 MAXIMUM LIKELIHOOD ESIMAION [1] Maximum Likelihood Estimator (1) Cases in which θ (unknown parameter) is scalar Notational Clarification: From now on, we denote the true value of θ as θ o hen, view θ

Διαβάστε περισσότερα

Section 8.3 Trigonometric Equations

Section 8.3 Trigonometric Equations 99 Section 8. Trigonometric Equations Objective 1: Solve Equations Involving One Trigonometric Function. In this section and the next, we will exple how to solving equations involving trigonometric functions.

Διαβάστε περισσότερα

A Note on Intuitionistic Fuzzy. Equivalence Relation

A Note on Intuitionistic Fuzzy. Equivalence Relation International Mathematical Forum, 5, 2010, no. 67, 3301-3307 A Note on Intuitionistic Fuzzy Equivalence Relation D. K. Basnet Dept. of Mathematics, Assam University Silchar-788011, Assam, India dkbasnet@rediffmail.com

Διαβάστε περισσότερα

Areas and Lengths in Polar Coordinates

Areas and Lengths in Polar Coordinates Kiryl Tsishchanka Areas and Lengths in Polar Coordinates In this section we develop the formula for the area of a region whose boundary is given by a polar equation. We need to use the formula for the

Διαβάστε περισσότερα

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω 0 1 2 3 4 5 6 ω ω + 1 ω + 2 ω + 3 ω + 4 ω2 ω2 + 1 ω2 + 2 ω2 + 3 ω3 ω3 + 1 ω3 + 2 ω4 ω4 + 1 ω5 ω 2 ω 2 + 1 ω 2 + 2 ω 2 + ω ω 2 + ω + 1 ω 2 + ω2 ω 2 2 ω 2 2 + 1 ω 2 2 + ω ω 2 3 ω 3 ω 3 + 1 ω 3 + ω ω 3 +

Διαβάστε περισσότερα

Tridiagonal matrices. Gérard MEURANT. October, 2008

Tridiagonal matrices. Gérard MEURANT. October, 2008 Tridiagonal matrices Gérard MEURANT October, 2008 1 Similarity 2 Cholesy factorizations 3 Eigenvalues 4 Inverse Similarity Let α 1 ω 1 β 1 α 2 ω 2 T =......... β 2 α 1 ω 1 β 1 α and β i ω i, i = 1,...,

Διαβάστε περισσότερα

Uniform Convergence of Fourier Series Michael Taylor

Uniform Convergence of Fourier Series Michael Taylor Uniform Convergence of Fourier Series Michael Taylor Given f L 1 T 1 ), we consider the partial sums of the Fourier series of f: N 1) S N fθ) = ˆfk)e ikθ. k= N A calculation gives the Dirichlet formula

Διαβάστε περισσότερα

Lecture 21: Properties and robustness of LSE

Lecture 21: Properties and robustness of LSE Lecture 21: Properties and robustness of LSE BLUE: Robustness of LSE against normality We now study properties of l τ β and σ 2 under assumption A2, i.e., without the normality assumption on ε. From Theorem

Διαβάστε περισσότερα

The Simply Typed Lambda Calculus

The Simply Typed Lambda Calculus Type Inference Instead of writing type annotations, can we use an algorithm to infer what the type annotations should be? That depends on the type system. For simple type systems the answer is yes, and

Διαβάστε περισσότερα

Lecture 12: Pseudo likelihood approach

Lecture 12: Pseudo likelihood approach Lecture 12: Pseudo likelihood approach Pseudo MLE Let X 1,...,X n be a random sample from a pdf in a family indexed by two parameters θ and π with likelihood l(θ,π). The method of pseudo MLE may be viewed

Διαβάστε περισσότερα

Lecture 2. Soundness and completeness of propositional logic

Lecture 2. Soundness and completeness of propositional logic Lecture 2 Soundness and completeness of propositional logic February 9, 2004 1 Overview Review of natural deduction. Soundness and completeness. Semantics of propositional formulas. Soundness proof. Completeness

Διαβάστε περισσότερα

Tutorial on Multinomial Logistic Regression

Tutorial on Multinomial Logistic Regression Tutorial on Multinomial Logistic Regression Javier R Movellan June 19, 2013 1 1 General Model The inputs are n-dimensional vectors the outputs are c-dimensional vectors The training sample consist of m

Διαβάστε περισσότερα

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

6.1. Dirac Equation. Hamiltonian. Dirac Eq. 6.1. Dirac Equation Ref: M.Kaku, Quantum Field Theory, Oxford Univ Press (1993) η μν = η μν = diag(1, -1, -1, -1) p 0 = p 0 p = p i = -p i p μ p μ = p 0 p 0 + p i p i = E c 2 - p 2 = (m c) 2 H = c p 2

Διαβάστε περισσότερα

derivation of the Laplacian from rectangular to spherical coordinates

derivation of the Laplacian from rectangular to spherical coordinates derivation of the Laplacian from rectangular to spherical coordinates swapnizzle 03-03- :5:43 We begin by recognizing the familiar conversion from rectangular to spherical coordinates (note that φ is used

Διαβάστε περισσότερα

Bounding Nonsplitting Enumeration Degrees

Bounding Nonsplitting Enumeration Degrees Bounding Nonsplitting Enumeration Degrees Thomas F. Kent Andrea Sorbi Università degli Studi di Siena Italia July 18, 2007 Goal: Introduce a form of Σ 0 2-permitting for the enumeration degrees. Till now,

Διαβάστε περισσότερα

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013 Notes on Average Scattering imes and Hall Factors Jesse Maassen and Mar Lundstrom Purdue University November 5, 13 I. Introduction 1 II. Solution of the BE 1 III. Exercises: Woring out average scattering

Διαβάστε περισσότερα

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018 Journal of rogressive Research in Mathematics(JRM) ISSN: 2395-028 SCITECH Volume 3, Issue 2 RESEARCH ORGANISATION ublished online: March 29, 208 Journal of rogressive Research in Mathematics www.scitecresearch.com/journals

Διαβάστε περισσότερα

Second Order Partial Differential Equations

Second Order Partial Differential Equations Chapter 7 Second Order Partial Differential Equations 7.1 Introduction A second order linear PDE in two independent variables (x, y Ω can be written as A(x, y u x + B(x, y u xy + C(x, y u u u + D(x, y

Διαβάστε περισσότερα

New bounds for spherical two-distance sets and equiangular lines

New bounds for spherical two-distance sets and equiangular lines New bounds for spherical two-distance sets and equiangular lines Michigan State University Oct 8-31, 016 Anhui University Definition If X = {x 1, x,, x N } S n 1 (unit sphere in R n ) and x i, x j = a

Διαβάστε περισσότερα

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ. Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο The time integral of a force is referred to as impulse, is determined by and is obtained from: Newton s 2 nd Law of motion states that the action

Διαβάστε περισσότερα

5. Choice under Uncertainty

5. Choice under Uncertainty 5. Choice under Uncertainty Daisuke Oyama Microeconomics I May 23, 2018 Formulations von Neumann-Morgenstern (1944/1947) X: Set of prizes Π: Set of probability distributions on X : Preference relation

Διαβάστε περισσότερα

Areas and Lengths in Polar Coordinates

Areas and Lengths in Polar Coordinates Kiryl Tsishchanka Areas and Lengths in Polar Coordinates In this section we develop the formula for the area of a region whose boundary is given by a polar equation. We need to use the formula for the

Διαβάστε περισσότερα

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Main source: Discrete-time systems and computer control by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1 Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1 A Brief History of Sampling Research 1915 - Edmund Taylor Whittaker (1873-1956) devised a

Διαβάστε περισσότερα

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics A Bonus-Malus System as a Markov Set-Chain Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics Contents 1. Markov set-chain 2. Model of bonus-malus system 3. Example 4. Conclusions

Διαβάστε περισσότερα

Inverse trigonometric functions & General Solution of Trigonometric Equations. ------------------ ----------------------------- -----------------

Inverse trigonometric functions & General Solution of Trigonometric Equations. ------------------ ----------------------------- ----------------- Inverse trigonometric functions & General Solution of Trigonometric Equations. 1. Sin ( ) = a) b) c) d) Ans b. Solution : Method 1. Ans a: 17 > 1 a) is rejected. w.k.t Sin ( sin ) = d is rejected. If sin

Διαβάστε περισσότερα

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1) HW 3 Solutions a) I use the autoarima R function to search over models using AIC and decide on an ARMA3,) b) I compare the ARMA3,) to ARMA,0) ARMA3,) does better in all three criteria c) The plot of the

Διαβάστε περισσότερα

CRASH COURSE IN PRECALCULUS

CRASH COURSE IN PRECALCULUS CRASH COURSE IN PRECALCULUS Shiah-Sen Wang The graphs are prepared by Chien-Lun Lai Based on : Precalculus: Mathematics for Calculus by J. Stuwart, L. Redin & S. Watson, 6th edition, 01, Brooks/Cole Chapter

Διαβάστε περισσότερα

Chapter 3: Ordinal Numbers

Chapter 3: Ordinal Numbers Chapter 3: Ordinal Numbers There are two kinds of number.. Ordinal numbers (0th), st, 2nd, 3rd, 4th, 5th,..., ω, ω +,... ω2, ω2+,... ω 2... answers to the question What position is... in a sequence? What

Διαβάστε περισσότερα

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006 Οδηγίες: Να απαντηθούν όλες οι ερωτήσεις. Ολοι οι αριθμοί που αναφέρονται σε όλα τα ερωτήματα είναι μικρότεροι το 1000 εκτός αν ορίζεται διαφορετικά στη διατύπωση του προβλήματος. Διάρκεια: 3,5 ώρες Καλή

Διαβάστε περισσότερα

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM Solutions to Question 1 a) The cumulative distribution function of T conditional on N n is Pr (T t N n) Pr (max (X 1,..., X N ) t N n) Pr (max

Διαβάστε περισσότερα

The challenges of non-stable predicates

The challenges of non-stable predicates The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates

Διαβάστε περισσότερα

Bayesian modeling of inseparable space-time variation in disease risk

Bayesian modeling of inseparable space-time variation in disease risk Bayesian modeling of inseparable space-time variation in disease risk Leonhard Knorr-Held Laina Mercer Department of Statistics UW May, 013 Motivation Ohio Lung Cancer Example Lung Cancer Mortality Rates

Διαβάστε περισσότερα

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def Matrices and vectors Matrix and vector a 11 a 12 a 1n a 21 a 22 a 2n A = a m1 a m2 a mn def = ( a ij ) R m n, b = b 1 b 2 b m Rm Matrix and vectors in linear equations: example E 1 : x 1 + x 2 + 3x 4 =

Διαβάστε περισσότερα

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM 2008 6 Chinese Journal of Applied Probability and Statistics Vol.24 No.3 Jun. 2008 Monte Carlo EM 1,2 ( 1,, 200241; 2,, 310018) EM, E,,. Monte Carlo EM, EM E Monte Carlo,. EM, Monte Carlo EM,,,,. Newton-Raphson.

Διαβάστε περισσότερα

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering Electronic Companion A Two-Sie Laplace Inversion Algorithm with Computable Error Bouns an Its Applications in Financial Engineering Ning Cai, S. G. Kou, Zongjian Liu HKUST an Columbia University Appenix

Διαβάστε περισσότερα

Example of the Baum-Welch Algorithm

Example of the Baum-Welch Algorithm Example of the Baum-Welch Algorithm Larry Moss Q520, Spring 2008 1 Our corpus c We start with a very simple corpus. We take the set Y of unanalyzed words to be {ABBA, BAB}, and c to be given by c(abba)

Διαβάστε περισσότερα

Math221: HW# 1 solutions

Math221: HW# 1 solutions Math: HW# solutions Andy Royston October, 5 7.5.7, 3 rd Ed. We have a n = b n = a = fxdx = xdx =, x cos nxdx = x sin nx n sin nxdx n = cos nx n = n n, x sin nxdx = x cos nx n + cos nxdx n cos n = + sin

Διαβάστε περισσότερα

Continuous Distribution Arising from the Three Gap Theorem

Continuous Distribution Arising from the Three Gap Theorem Continuous Distribution Arising from the Three Gap Theorem Geremías Polanco Encarnación Elementary Analytic and Algorithmic Number Theory Athens, Georgia June 8, 2015 Hampshire college, MA 1 / 24 Happy

Διαβάστε περισσότερα

Coefficient Inequalities for a New Subclass of K-uniformly Convex Functions

Coefficient Inequalities for a New Subclass of K-uniformly Convex Functions International Journal of Computational Science and Mathematics. ISSN 0974-89 Volume, Number (00), pp. 67--75 International Research Publication House http://www.irphouse.com Coefficient Inequalities for

Διαβάστε περισσότερα

Theorem 8 Let φ be the most powerful size α test of H

Theorem 8 Let φ be the most powerful size α test of H Testing composite hypotheses Θ = Θ 0 Θ c 0 H 0 : θ Θ 0 H 1 : θ Θ c 0 Definition 16 A test φ is a uniformly most powerful (UMP) level α test for H 0 vs. H 1 if φ has level α and for any other level α test

Διαβάστε περισσότερα

( ) 2 and compare to M.

( ) 2 and compare to M. Problems and Solutions for Section 4.2 4.9 through 4.33) 4.9 Calculate the square root of the matrix 3!0 M!0 8 Hint: Let M / 2 a!b ; calculate M / 2!b c ) 2 and compare to M. Solution: Given: 3!0 M!0 8

Διαβάστε περισσότερα

Quadratic Expressions

Quadratic Expressions Quadratic Expressions. The standard form of a quadratic equation is ax + bx + c = 0 where a, b, c R and a 0. The roots of ax + bx + c = 0 are b ± b a 4ac. 3. For the equation ax +bx+c = 0, sum of the roots

Διαβάστε περισσότερα

4.6 Autoregressive Moving Average Model ARMA(1,1)

4.6 Autoregressive Moving Average Model ARMA(1,1) 84 CHAPTER 4. STATIONARY TS MODELS 4.6 Autoregressive Moving Average Model ARMA(,) This section is an introduction to a wide class of models ARMA(p,q) which we will consider in more detail later in this

Διαβάστε περισσότερα

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in : tail in X, head in A nowhere-zero Γ-flow is a Γ-circulation such that

Διαβάστε περισσότερα

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds! MTH U341 urface Integrals, tokes theorem, the divergence theorem To be turned in Wed., Dec. 1. 1. Let be the sphere of radius a, x 2 + y 2 + z 2 a 2. a. Use spherical coordinates (with ρ a) to parametrize.

Διαβάστε περισσότερα

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Queensland University of Technology Transport Data Analysis and Modeling Methodologies Queensland University of Technology Transport Data Analysis and Modeling Methodologies Lab Session #7 Example 5.2 (with 3SLS Extensions) Seemingly Unrelated Regression Estimation and 3SLS A survey of 206

Διαβάστε περισσότερα

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ Ψηφιακή Οικονομία Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών Τέλος Ενότητας Χρηματοδότηση Το παρόν εκπαιδευτικό υλικό έχει αναπτυχθεί

Διαβάστε περισσότερα

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =? Teko Classes IITJEE/AIEEE Maths by SUHAAG SIR, Bhopal, Ph (0755) 3 00 000 www.tekoclasses.com ANSWERSHEET (TOPIC DIFFERENTIAL CALCULUS) COLLECTION # Question Type A.Single Correct Type Q. (A) Sol least

Διαβάστε περισσότερα