Homepage for the workshop. They have great scholarship opportunities for grad students to cover registration and travel.

## Notes from the time I went (July 2015)

### SISG 9 - Population Genetics

Taught by Bruce Weir and Jerome Jerome Goudet. Homepage for the R scripts from the workshop.

• R basics (very quick).
• It did help me learn about data.frame, and lists, they make much more sense now.
• Allele Frequency and Hardy-Weinberg
• I want to know if there's a continuous function that describes the maximum value of the binomial distribution from 0-1. Seems like there should be.
• The EM algorithm for Two loci isn't guaranteed to converge, sometimes it gets stuck flipping back and forth between two intermediate values. Seems easy to fix, but it'd be curious to know if there was some distribution of genotypes that was guaranteed to break it.

### SISG 18 - MCMC for Genetics

Taught by Eric Anderson and Matthew Stephens

Background: Reversible jump MCMC (Greene 1995)

• Monday AM
• Probability as representation of uncertainty vs long range frequency.
• Expectation of mean of beta distribution is alpha/(alpha+beta)
• Jeffreys Prior - a=b=0.5
• Marginal distribution of y integrating out over theta
• "Propagating uncertainly" - Take uncertainty into account down the line.
• Monday AM II
• Monte Carlo Method - "In search of a definition..." - Approximate expectation based on sample mean of simulated random variables.
• "Simple sample mean..."
• Wright-Fisher Model
• Sampling with replacement between generations
• Markov Chains
• Transition probability matrices. Do they have to be symmetric?
• Limiting distribution (ergodic Markov chain), regardless of where you start, as t->inf the probability of being in any state will be the same.
• Time averaging over the chain converges to the limiting distribution.
• "known only up to scale" - shape but not normalizing constant?
• Reversible jump mcmc? Bridge sampling? Importance sampling?
• Ergodicity
• No transient states - No states you can't reach in a finite number of steps.
• Irreducible - any state is reachable from any other state in a finite number of steps
• Aperiodic - Can't get stuck in a loop
• Stationary distribution of Markov chain
• General balance equation: πP = π, where P is a transition probability matrix and π is the stationary distribution.
• Time-reversible Markov chains is required to for detailed balance to satisfy general balance
• Metropolis-Hastincs Algorithm
• Take state i, propose state j, accept the proposed move with probability min {1, some probability Hastings ratio}
• Hastings ratio: f(j)/f(i) x q(i|j)/q(j|i)
• Ratio of target densites x ratio of proposal densities
• Symmetric proposal transition matrices will cancel the right half of the equation.
• f(j) is more likely then it increases probablity
• Monday PM
• easyMCMC in R
• Sticky chains: Big SD too few accepted changes, very small SD = too many accepted changes.
• In complex problems, acceptance rate should be ~1%
• Higher dimension problems should ~= lower acceptance rate (should propose more dramatic moves, since explored space is more complex)
• Multi-modal target (need mcmc sd wide enough to traverse all modes)
• Multidimentional MCMC
• component-wise mcmc/gibbs sampling
• Genotype freq. and inbreeding
• Simple component-wise M-H sampling
• propose-sample/reject each parameter individually
• Gibbs sampling (Full conditional distribution)
• Latent variables - missing data models. What data would you need in order to make it really easy to solve the problem?
• Distribution conditional on fixed state of all other parameters
• Gibb sampling is a special case of component-wise M-H sampling, conditional on all other parameters
• Wrap-up
• MCMC almost always proposes small changes to subsets of the variables
• Detailed balance, irreducible chain, latent variables
• Tuesday AM
• structure admixture model: hybrid zones, gene flow, population structure, subpopulations
• Falush 2003 - non-independence between loci, allele freqs in pops incorporating inbreeding
• Falush 2007 - dominant markers and null alleles
• Beaumont 2001 (scottish wildcats)
• structure prior pop info model: multilocus genotypes, sampling locations, known symmetrical migration rate, migration limited to most recent n generations.
• More parameters
• Oh fuck that's what the Q-matrix is, derp.
• NewHybrids? (Anderson 2002) - does not require known locations, allows more than one migrant ancestor, but only 2 sources, non-symmetrical migration, dependence within loci is modeled
• BayesAss?+ - Specialized models, detect recent immigrants, estimate separate migration rates, multiple locales/subpops.
• Multilocus, requires distinct sampling locales, assumes no LD, subpops are known, infrequent migration
• mcmc in structure -
• Expected values can be approximated with sample means.
• dirichlet is the multivariate generalization of the beta dist
• wat conjugate prior??
• dirichlet vector with k components that sum to 1
• Rrunstruct - Usage for the R structure wrapper code
• Tuesday PM
• Latent variables could make gibbs sampling easier?
• Haplotyping (Phase)
• Clark's method (search population for common haploytpes
• Id unambiguous individuals, construct known haplotypes, disambiguate unknown haplotypes from combinations of known haplotypes.
• Results may depend on order of observation, frequency is ignored, only matches exact haplotypes, doesn't measure uncertainty
• Bayesian method
• iterate through multiple times
• Use haplotype freq information
• account for uncertainty
• Haplotypes will look similar to ones you've seen before.
• Incorporating recombination is trickier.
• "Pseudo-Gibbs sampler"
• Stronger modelling assumptions tend to underestimate uncertainty.
• Wednesday AM
• Bayesian Model Choice
• Posterior odds = Prior odds x Bayes Factor
• Posterior odds: ratio of posterior probabilities of model given data
• Prior odds: ratio of probability of the models
• Bayes factor: likelihood of model 1 over model 2 (data given the model)
• Bayes factor ("marginal likelihood"), isolate the model from the data, and see how prior assumptions on the model will change the results.
• Bayes factor does not rely on prior odds, which is why people use it. Interpreted in light of prior odds. Interpretation depends on context, and on prior odds.
• If you collect enough data the posterior odds will converge toward infinity with probability 1 in favor of the true model.
• Sensitivity analysis: Bayes factors can be peculiarly sensitive to the priors in ways you can't expect, so testing different priors could be informative.
• Model choice: Don't use flat priors on things that are only present in one model
• Wednesday AM2
• How to reduce variance of sampling: reduce variance of function sampled or increase the number of samples taken.
• minimal relevance sampling: Choice of density of sampling will influence the variance of your monte carlo estimate.
• That's pretty much what importance sampling is all about is multiplying things by 1, dressed up in a "tricky fashion".
• Importance sampling: How to sample wisely for your monte carlo estimates.
• "Poor mixing is the evil cousin of reducibility"
• Metropolis-coupled monte carlo (heated chains)
• Chains with exponent modifiers 0 < x < 1
• Simulated annealing
• Chains with exponent modifiers x > 1

### SISG 22 - Bayesian Stats

Peter Hoff & Jon Wakefield

### Wednesday PM1

• Probability as belief or information, quantifying uncertainty.
• Information / uncertainty, are they 1-to-1? There is a relationship between information and proability.
• "There's good induction, and there's bad induction."
• "We'll talk about, at the end, what to do if you don't have any beliefs."
• Y-axis of beta distribution is a probability density, a dimensionless quantity.
• Posterior expectation is the weighted average of the data mean + prior mean. Lots of data makes posterior expectation closer to data mean, less data makes posterior closer to prior mean. o_O
• Probability of rare events/Predictive models
• Prior distribution: Idealistic vs realistic, capture the gross features about the priors...
• ML tends to overfit to the data when you have a large parameter space relative to the sample size...

### Wednesday PM2

• "You've got your data, it's not random anymore.... You've run your experiment, your data certainly aren't random."
• Partition is a collection of sets.
• Axioms
• Total probability - Sum of all events in a partition = 1
• Marginal probability - Sum of probability of event E intersected with all possible events in the partition.
• Likelihood ratio x prior odds = posterior odds
• Standard distributions
• Discrete random variable
• Pr(Y=y) = p(y) probability density function, must be 0 <= p(y) <= 1 & sum of all p(y) = 1
• Probability densities
• Always >=0, sum of all area under curve = 1
• Binary distribution
• Y={1,0}
• Pr(Y=y|θ) = p(y|θ) = θy(1-θ)(1-y)
• Binomial distribution
• Likelihood inference
• Poisson distribution
• Why use gamma vs beta?

### Thursday AM1

• Posterior is proportional to the likelihood times the prior (colloquial).
• Estimation, hypothesis testing, prediction.
• Beta distribution as a prior:
• Mean of beta is a/(a+b)
• Beta is flexible, to a degree
• Uniform prior is tricky, even though its "uninformative", its not uniform on all scales.
• Conjugate: posterior is the same form as the prior.
• Theta(y+a-1) * (1-Theta)(N-y+b-1)
• Posterior mean: (y+a)/(N+a+b)
• Weighted estimator of sample mean and prior mean
• As sample size increases sample mean weight increases
• With fixed sample size increasing a+b (beta parameters) increases weight on prior.
• Nonsymmetric vs asymmetric?
• Averaging out over all the possible values theta could take weighted by the posterior (Averaging out the uncertainty).
• Hypothesis testing:
• Ratio of probabilities of data given model 1 vs model 2 or null.
• "Bayesian modelling can be very intoxicating."

### Thursday AM2

• Dirichlet is the multinomial generalization of the beta
• Large values of input parameters increase influence of the prior.
• Bayes factors for HWE
• Bayes factor - ratio of probabilities of data over model 1 and model 2.
• Bayes factor * prior odds < CII/CI = R
• CI - cost of type I error, CII - cost of type II error
• Monte Carlo approximation (backtracking to yesterday)

### Thursday PM1

• Linear regression models
• y = a0 + a1Xa + e (e=random variation between individuals)
• linear in the parameters
• I = covariance of outcomes. In the simple case you assume there is no covariance, so I is the identity matrix, but if there is covariance between the data or the animals or experiments then it can be incorporated with this variable.
• Ordinary least squares estimation
• SSR(Beta) sum of squared residuals
• Find values for Beta such that sum of squared residuals is small to increase the likelihood value
• R - solve(), lm() "linear model"
• Bayesian regression
• You can make probability statements
• probability some beta value >0 given
• OLS overfits when # of predictors is large
• can do model selection and averaging
• Beta is a vector
• If prior variance is very small, then posterior mean will concentrate around your prior mean.
• If you have a lot of data, as sample size grows mean will concentrate on OLS estimate.
• Similar logic to Bayesian inference
• How to select prior for a vector of parameters
• g-prior
• Uncertainty in my prior is the same as uncertainy from n/g observations.
• Posterior calculations are relatively similar
• Posterior mean estimate is OLS estimate shrunken a little bit toward zero
• "It's bad form not to have a picture at the end."

### Thursday PM2

• Changing significance level as a function of n
• a/(1-b) * prior odds of H0/H1
• a = alpha level, significance
• b = power
• R is the ratio of costs
• Significance level should decrease as n increases.
• False discovery rate B/K
• Bayesian False Discoveries
• Posterior odds = bayes factor * prior odds < R
• Depends on the sample size, but not on the number of test
• Don't use bonferroni, use false discovery rate, but better still use bayesian methods.
• Bayesian approaches in GWAS (Stephens & Balding 2009)

### Friday AM1

• Linear regression
• Model selection
• Iteratively throw away the least significant regressor until you reach some minimum threshold. It can be good... it can be better than doing nothing.
• In regression, the t-statistic is the regression coeffecient divided by the standard deviation. Basically any regression coeffecient that's greater than 2 will resemble significance (> 2 std dev).
• Bayesian model selection
• How to identify the predictors that do have an effect and eliminate the predictors that don't have an effect.
• Posterior odds = prior odds * Bayes factor
• BF - how well the data are fit by the model
• Balance goodness of fit of the observed data with our prior belief that most of the coefficients are probably zero.
• A model will be penalized if it's too complex (if there are too many terms turned on in the regression model), or if it doesn't fit well (if the sum of squared residuals (SSR) is too big).
• Gibbs sampling
• Sample from full conditional distributions p(x|y,z), p(y|x,z), etc...
• Distribution of sequences will approximate the true distribution p(x,y,z)

### Friday AM2

• GLM/GLMM/INLA
• Random variable is the exposure status
• epsilon-sub(i,j) - Latent variable that induces overdispersion
• GLM
• Response follows an exponential family
• mean model is linear in covariates
• Link function relates mean to covariates
• INLA - integrated nested Laplace approximation
• GLMM
• Extends GLM to include random effects (wobble/measurement error)
• Mixed is a mixture of fixed and random effects
• Fixed effects + random effects
• Approximate Bayes inference
• Approximate bayes factor from 2 estimates
• Bayes factors are not independent
• "Shouldn't say mindless, it's judgmental."
• It looks like a complicated formula, but it's intuitive, and it's just a formula.

#### quotes

• "Out of all the tomorrows we might experience...."
• "Uncertainty is, intrinsically, personal."
• "Random draws to mapped calls..."
• "It's very hard to get rational behavior out of a committee."
• "The problem is: How flat matters."