Version 6 (modified by iovercast, 9 years ago) (diff) |
---|
Homepage for the workshop. They have great scholarship opportunities for grad students to cover registration and travel.
Notes from the time I went (July 2015)
SISG 9 - Population Genetics
Taught by Bruce Weir and Jerome Jerome Goudet. Homepage for the R scripts from the workshop.
- R basics (very quick).
- It did help me learn about data.frame, and lists, they make much more sense now.
- Allele Frequency and Hardy-Weinberg
- I want to know if there's a continuous function that describes the maximum value of the binomial distribution from 0-1. Seems like there should be.
- The EM algorithm for Two loci isn't guaranteed to converge, sometimes it gets stuck flipping back and forth between two intermediate values. Seems easy to fix, but it'd be curious to know if there was some distribution of genotypes that was guaranteed to break it.
SISG 18 - MCMC for Genetics
Taught by Eric Anderson and Matthew Stephens
- Monday AM
- Probability as representation of uncertainty vs long range frequency.
- Expectation of mean of beta distribution is alpha/(alpha+beta)
- Jeffreys Prior - a=b=0.5
- Marginal distribution of y integrating out over theta
- "Propagating uncertainly" - Take uncertainty into account down the line.
- Monday AM II
- Monte Carlo Method - "In search of a definition..." - Approximate expectation based on sample mean of simulated random variables.
- "Simple sample mean..."
- Wright-Fisher Model
- Sampling with replacement between generations
- Markov Chains
- Transition probability matrices. Do they have to be symmetric?
- Limiting distribution (ergodic Markov chain), regardless of where you start, as t->inf the probability of being in any state will be the same.
- Time averaging over the chain converges to the limiting distribution.
- "known only up to scale" - shape but not normalizing constant?
- Reversible jump mcmc? Bridge sampling? Importance sampling?
- Ergodicity
- No transient states - No states you can't reach in a finite number of steps.
- Irreducible - any state is reachable from any other state in a finite number of steps
- Aperiodic - Can't get stuck in a loop
- Stationary distribution of Markov chain
- General balance equation: πP = π, where P is a transition probability matrix and π is the stationary distribution.
- Time-reversible Markov chains is required to for detailed balance to satisfy general balance
- Metropolis-Hastincs Algorithm
- Take state i, propose state j, accept the proposed move with probability min {1, some probability Hastings ratio}
- Hastings ratio: f(j)/f(i) x q(i|j)/q(j|i)
- Ratio of target densites x ratio of proposal densities
- Symmetric proposal transition matrices will cancel the right half of the equation.
- f(j) is more likely then it increases probablity
- Monte Carlo Method - "In search of a definition..." - Approximate expectation based on sample mean of simulated random variables.
- Monday PM
- easyMCMC in R
- Sticky chains: Big SD too few accepted changes, very small SD = too many accepted changes.
- In complex problems, acceptance rate should be ~1%
- Higher dimension problems should ~= lower acceptance rate (should propose more dramatic moves, since explored space is more complex)
- Multi-modal target (need mcmc sd wide enough to traverse all modes)
- Multidimentional MCMC
- component-wise mcmc/gibbs sampling
- Genotype freq. and inbreeding
- Simple component-wise M-H sampling
- propose-sample/reject each parameter individually
- Gibbs sampling (Full conditional distribution)
- Latent variables - missing data models. What data would you need in order to make it really easy to solve the problem?
- Distribution conditional on fixed state of all other parameters
- Gibb sampling is a special case of component-wise M-H sampling, conditional on all other parameters
- Wrap-up
- MCMC almost always proposes small changes to subsets of the variables
- Detailed balance, irreducible chain, latent variables
- easyMCMC in R
- Tuesday AM
- structure admixture model: hybrid zones, gene flow, population structure, subpopulations
- Falush 2003 - non-independence between loci, allele freqs in pops incorporating inbreeding
- Falush 2007 - dominant markers and null alleles
- Beaumont 2001 (scottish wildcats)
- structure prior pop info model: multilocus genotypes, sampling locations, known symmetrical migration rate, migration limited to most recent n generations.
- More parameters
- Oh fuck that's what the Q-matrix is, derp.
- NewHybrids? (Anderson 2002) - does not require known locations, allows more than one migrant ancestor, but only 2 sources, non-symmetrical migration, dependence within loci is modeled
- BayesAss?+ - Specialized models, detect recent immigrants, estimate separate migration rates, multiple locales/subpops.
- Multilocus, requires distinct sampling locales, assumes no LD, subpops are known, infrequent migration
- mcmc in structure -
- Expected values can be approximated with sample means.
- dirichlet is the multivariate generalization of the beta dist
- wat conjugate prior??
- dirichlet vector with k components that sum to 1
- structure admixture model: hybrid zones, gene flow, population structure, subpopulations
quotes
- "Out of all the tomorrows we might experience...."
- "Uncertainty is, intrinsically, personal."