ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Σχετικά έγγραφα
Solution Series 9. i=1 x i and i=1 x i.

Other Test Constructions: Likelihood Ratio & Bayes Tests

Statistical Inference I Locally most powerful tests

ST5224: Advanced Statistical Theory II

2 Composition. Invertible Mappings

Problem Set 3: Solutions

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Exercises to Statistics of Material Fatigue No. 5

6.3 Forecasting ARMA processes

Example Sheet 3 Solutions

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Lecture 21: Properties and robustness of LSE

derivation of the Laplacian from rectangular to spherical coordinates

Homework 3 Solutions

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Uniform Convergence of Fourier Series Michael Taylor

A Note on Intuitionistic Fuzzy. Equivalence Relation

Homework for 1/27 Due 2/5

Solutions to Exercise Sheet 5

6. MAXIMUM LIKELIHOOD ESTIMATION

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

w o = R 1 p. (1) R = p =. = 1

Areas and Lengths in Polar Coordinates

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

C.S. 430 Assignment 6, Sample Solutions

ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13

4.6 Autoregressive Moving Average Model ARMA(1,1)

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Math221: HW# 1 solutions

Areas and Lengths in Polar Coordinates

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Second Order Partial Differential Equations

Continuous Distribution Arising from the Three Gap Theorem

Lecture 17: Minimum Variance Unbiased (MVUB) Estimators

Parametrized Surfaces

Theorem 8 Let φ be the most powerful size α test of H

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

EE512: Error Control Coding

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Srednicki Chapter 55

F19MC2 Solutions 9 Complex Analysis

Concrete Mathematics Exercises from 30 September 2016

Lecture 34 Bootstrap confidence intervals

Every set of first-order formulas is equivalent to an independent set

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Section 8.3 Trigonometric Equations

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Matrices and Determinants

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Probability and Random Processes (Part II)

P AND P. P : actual probability. P : risk neutral probability. Realtionship: mutual absolute continuity P P. For example:

Congruence Classes of Invertible Matrices of Order 3 over F 2

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

CRASH COURSE IN PRECALCULUS

557: MATHEMATICAL STATISTICS II RESULTS FROM CLASSICAL HYPOTHESIS TESTING

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

Lecture 12: Pseudo likelihood approach

Notes on the Open Economy

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

5. Choice under Uncertainty

Empirical best prediction under area-level Poisson mixed models

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Approximation of distance between locations on earth given by latitude and longitude

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Introduction to the ML Estimation of ARMA processes

Part III - Pricing A Down-And-Out Call Option

Reminders: linear functions

The Pohozaev identity for the fractional Laplacian

Lecture 26: Circular domains

12. Radon-Nikodym Theorem

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Finite Field Problems: Solutions

ECE Spring Prof. David R. Jackson ECE Dept. Notes 2

The Simply Typed Lambda Calculus

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

5.4 The Poisson Distribution.

Tutorial on Multinomial Logistic Regression

D Alembert s Solution to the Wave Equation

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

DERIVATION OF MILES EQUATION FOR AN APPLIED FORCE Revision C

Strain gauge and rosettes

The semiclassical Garding inequality

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Bayes Rule and its Applications

n=2 In the present paper, we introduce and investigate the following two more generalized

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

The Probabilistic Method - Probabilistic Techniques. Lecture 7: The Janson Inequality

Transcript:

ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Information bound Lecturer: Yihong Wu Scribe: Shiyu Liang, Feb 6, 06 [Ed. Mar 9] Recall the Chi-squared divergence and Hammersley-Chapman-Robbins (HCR) bound from last class. Suppose that P, Q are two probability distribution defined on R, that X X is random variable. The Chi-squared divergence is χ (P Q) Furthermore, choosing affine function g yields which gives the HCR bound. sup E P [g(x)] E Q [g (X)]. g:x R χ (P Q) (E P [X] E Q [X]) var Q [X] 7. HCR Lower Bound We are now continuing on the HCR lower bound from the last class. We here illustrate an example of HCR lower bound on estimation. Example 7. (Estimation). Let R be an unknown, deterministic parameter, and let X R be a random variable, interpreted as a measure of or data. Suppose ˆ is an unbiased estimate of based on X. The relationships can be shown as X ˆ. The estimation loss l(, ˆ) is defined as l(, ˆ) ( ˆ). Let P P, Q P, and then the risk is lower bounded by R (ˆ) var (ˆ) (E ˆ E ˆ) χ (P P ). Suppose ˆ is an unbiased estimate of, then R (ˆ) sup ( ) χ (P P ) lim ( ) χ (P P ). 7. Fisher Information The Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown, deterministic parameter upon which the probability of the observatoin X depends. Assume the probability density function of random variable X conditional on the value of is p. The Fisher information is defined as

Definition 7. (Fisher Information). The Fisher information of the parameteric family of densitities {p : Θ} (with respect to µ) at is [ ( ) ] log ( ) p p I() E. (7.) p Theorem 7. (Fisher Information). If p is twice differentiable with respect to, the Fisher information can be written as [ ] I() E log p Proof. Since and Thus, we have log p p p ( p ) p [ p E ] p I() E [ ( log p ( ) p p log p p µ(dx) 0. ) ] E [ log p ]. Theorem 7. (Fisher Information: mutiple sample). Suppose random sample X,..., X n independently and identically drawn from a distribution p. The Fisher information I n () provided by random samples X,..., X n is I n () ni(), where I() is Fisher information provided by a single sample X. Proof. We first denote the joint pdf of X,..., X n as p (x,..., x n ) n p (x i ). Then the Fisher information I n () provided by X,..., X n is [ ( l ) ] (X,..., X n ) ( ) l (x,..., x n ) I n () E... p (x,..., x n )dx dx... dx n, which is an n-dimensional integral. Thus, by Theorem 7., the Fisher information provided by X,..., X n can be calculated as [ ] [ n ] log p (X,..., X n ) log p (X i ) n [ ] log p (X i ) I n () E E E ni(). i i i

7.3 Variantions of HCR/CR Lower Bound This section contains the following three versions of HCP/CR lower bound: Multiple Samples Version Multivariate Version Functional Version 7.3. Multiple Samples Version Suppose is some unknown, deterministic parameter and X,..., X n are n random variables independently and identically coming from the distribution P. The estimate ˆ comes from X,..., X n. The relationships is shown as follows: X,..., X n ˆ. Then the risk is lower bound by For the HCR lower bound, (E ˆ E ˆ) R (ˆ) var ˆ χ (P n P n ). We next show the counterpart for ( ) R (ˆ) sup ( + χ (P P )) n χ (P Q) (E P X E Q X). var Q X Suppose P, Q are two distributions defined on R p, then χ (P Q) Furthter, if g(x) a, X +, then If we further assume E Q X 0, then we have Therefore, we finally have ni(). sup [E P g(x) E Q g (X) ]. g:r p R χ (P Q) E P a, X + E Q ( a, X + ). χ (P Q) a, E P X a T E Q [XX T ]a. χ (P Q) (E P X E Q X) T cov Q (X)(E P X E Q X) 3

7.3. Multivariate Version Let the loss function l(, ˆ) ˆ and ˆ be the unbiased estimate of, i.e., E ˆ. Then ( ) T cov (ˆ)( ) χ (P P ) ( ) T I()( ) +, where the equality follows from the Taylor expansion and Fisher information matrix is given as I() If we take + ɛu, ɛ 0, then we have P ( P ) T P. which is equivalent to and further indicates Then we have where I i (I(P ) ii ) since u T cov (ˆ)u u T I()u, cov (ˆ) I (), R (ˆ) tr(cov (ˆ)) tr(i ()). E ˆ p i p E(ˆ i i ) i I i () tr(i ()). Note that if we apply the one-dimensional CRLB for each coordinate we would get the rightmost inequality which is weaker. In addition, the Fisher information matrix can be written as ( [ I() E [( log P )( log P ) T ]) log P ] cov ( log P ) E. i j p i I i, 7.3.3 Functional Version Assume that is an unknown parameter, that random variable X comes from the distribution P and that ˆT (X) is an estimation for T (), where T : Θ R. The relationship is shown as follows: X ˆT. If we further assume ˆT () is an unbiased estimation for T (), then var ( ˆT ) T I() 4

7.4 Bayesian Cramér-Rao Lower Bound The class will introduce two methods of proving Bayesian Cramér-Rao lower bound. Method : χ Bayesian HCR Bayesian CR Method : Classical Method The notation used in this section is shown as follows: Θ R l(, ˆ) ( ˆ). π is a nice prior on R The relationship can be described as follows: π X ˆ. Theorem 7.3 (Bayesian Cramér-Rao Lower Bound). Assuming suitable regularity conditions, then R R π inf ˆ where R π is the Bayes risk and I(π) π π. E π (, ˆ) E π I() + I(π), Let Q : π P Q X X ˆ, P : π P P X X ˆ. Then χ (P X Q X ) χ (P ˆ Q ˆ) data processing inequality χ (P ˆ Q ˆ) data processing inequality (E( ˆ) E Q ( ˆ)) var (ˆ ) δ var ( ˆ) Further, if we assume Q π, Q X P, P T δ π, P X P δ, then P X Q X which further indicates Pˆ Qˆ and the mean of ˆ under distribution of P equals to the mean under the distribution under Q. For the Bayesian HCR lower bound, R π sup δ 0 We give a short proof of (7.) here. δ χ (P X Q X ) lim δ 0 δ χ (P X Q X ) I(π) + E π [I()]. (7.) 5

Proof. χ (PX Q X ) [P (P X Q X ) + (P Q )Q X ] (P X Q X ) Q X Q X P (PX Q X ) (P Q ) P (P Q ) + Q Q X Q + Q [ ) ] χ (P Q ) + E χ (P X Q X ) ( P Q (P X Q X ) Then applying χ (P Q ) χ (T δπ π) δ [I(π) + o()] by Taylor expansion, χ (P X Q X ) [I() + o()]δ by Taylor expansion, we obtain (7.). 7.5 Information Bound In this section, we introduce the local version of the minimax lower bound. The local minimax risks is defined in a quadratic form: inf ˆ sup 0 ɛ E(ˆ ). Further, we have inf ˆ If I() is continuous, then sup E(ˆ ) 0 ɛ I() + ne π [I()] + o() ne π [I()] E π [I()] I( 0 ) + o() + o() ni(). Assume the random variable Z coming from the distribution π, Z π. Let I(Z) I(π). For constant α, β 0, then I(Z + α) I(Z) and I(βZ) I(Z). If the π has the distribution of form β cos πx, then min π:[,] I(π) π. If the distribution π has the form of cos π(x 0 ) ɛ, then I() π ɛ. Then we have inf sup E(ˆ ) R π ˆ 0 ɛ ne π [I()] + I(π). Now if we pick ɛ n /4, we have R inf ˆ sup E ( ˆ) 0 n /4 ni() + o( Optimize R n) + o() n inf 0 Θ I( 0 ). 6