subject
Mathematics, 12.11.2019 00:31 124319

For a given day i, we let yi = 1 if the ground-level ozone concentration near some city (houston, in our data) is at a dangerously high level. this is called an "ozone day". we let yi = 0if the ozone concentration is low enough to be considered safe. we want to predict yi from more easily measured "features" describing atmospheric pollutant levels and meteorological conditions (temperature, humidity, wind speed, there are a total of m = 72 of these features collected each day, which we denote by xi = {xij j = 1, m}. each feature xij e r is a real number, and we will thus use a gaussian distribution to model these continuous random variables. we will build a "naive bayes" classifier, which predicts observation i to be an ozone day if p(yį = 1| xi) > p(yi = 0 | x; ), and a non-ozone day otherwise.
using bayes rule, this classifier is equivalent to one that chooses y; = 1 if and only if py(1)fx|v(xi | 1) py(0)fx|v(xi|0) fx(x; ) fx(xi) in py(1) + in fxy(xi|1) > in py(0) + in fx|v(xi|0).

in this equation, py(yi) is the probability mass function that defines the prior probability of ozone and non-ozone days. the conditional probability density function fxy(ti yi) describes the distribution of the m = 72 environmental features, which we assume depends on the type of day. we make two simplifying assumptions about these densities: the features x; are conditionally independent given y, and their distributions are gaussian. thus:

м. fx y(xi| 1) = ii 1 exp{_ (tij – h1; 1.1 2007, exp{-"*"2013" }; fx/y(xi|0) = ii 1 17270 exp exp{-(tij – 203} given y; = 1, xij is gaussian with mean mi; and variance on. given y, = 0, xij is gaussian with mean mo and variance go. there are a total of 2m mean parameters and 2m variance parameters, since every feature xij has a distinct distribution for each of the two classes.

a) derive equations for in fxy (u; | 1) and in fxy(x; 0), the (natural) logarithms of the conditional probability density functions in equations (2,3). for numerical robustness, simplify your answer so that it does not involve the exponential function.

because ozone days are relatively rare, a classifier that always predicts yi = 0 would be correct over 95% of the time, but would obviously not be practically useful for reducing ozone hazard. to evaluate our classifiers, we will thus separately compute the numbers of false alarms (predictions of ozone days when in reality y = 0) and missed detections (predictions of non-ozone days when in reality y; = 1). we are willing to allow some false alarms as long as there are very few missed detections. for all parts below, assume that the mean parameters muje moj are set to match the mean of the empirical distribution of the training data. the demo code computes these means.

b) start by assuming the classes are equally probable (py(1) = py(0) = 1/2), and have unit variance (01 = 0; = 1). write code to compute the log conditional densities from part (a). then using equation (1), classify each test example. report your classification accuracy, and the numbers of false alarms and missed detections. hint: your classifer should have fewer than 10 missed detections.

c) rather than assuming features have variance one, set the variance parameters oli, equal to the variance of the empirical distribution of the training data. classify each test example using equation (1) with these variance estimates. report your classification accuracy, and the numbers of false alarms and missed detections.

d) rather than assuming the classes are equally probable, estimate py(1) as the fraction of training examples that are ozone days. classify each test example using equation (1) with this informative class prior, and the variances from part (c). report your classification accuracy, and the numbers of false alarms and missed detections.

ansver
Answers: 1

Another question on Mathematics

question
Mathematics, 21.06.2019 15:40
The data represents the semester exam scores of 8 students in a math course. {51,91,46,30,36,50,73,80} what is the five-number summary?
Answers: 1
question
Mathematics, 21.06.2019 17:30
Adozen bananas weigh 3 pounds how many ounces does 1 banana weigh
Answers: 1
question
Mathematics, 21.06.2019 20:00
Choose the linear inequality that describes the graph. the gray area represents the shaded region. a: y ≤ –4x – 2 b: y > –4x – 2 c: y ≥ –4x – 2 d: y < 4x – 2
Answers: 2
question
Mathematics, 21.06.2019 21:10
Hey free points ! people i have a few math questions on my profile consider looking at them i have to get done in 30 mins!
Answers: 1
You know the right answer?
For a given day i, we let yi = 1 if the ground-level ozone concentration near some city (houston, in...
Questions
question
Mathematics, 03.09.2021 20:20
question
Mathematics, 03.09.2021 20:20
question
Mathematics, 03.09.2021 20:20
question
Mathematics, 03.09.2021 20:20
question
English, 03.09.2021 20:20
Questions on the website: 13722360