<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Projects | Ian Arriaga-MacKenzie</title><link>https://www.iansam.com/project/</link><atom:link href="https://www.iansam.com/project/index.xml" rel="self" type="application/rss+xml"/><description>Projects</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sat, 01 May 2021 00:00:00 +0000</lastBuildDate><image><url>https://www.iansam.com/media/icon_hu5f5a0667974a3c00a8db71daae0e12dc_20626_512x512_fill_lanczos_center_3.png</url><title>Projects</title><link>https://www.iansam.com/project/</link></image><item><title>Binomial Distribution of Allele Data</title><link>https://www.iansam.com/project/binomial-distribution-of-allele-data/</link><pubDate>Sat, 01 May 2021 00:00:00 +0000</pubDate><guid>https://www.iansam.com/project/binomial-distribution-of-allele-data/</guid><description>&lt;p>Summary population allele data can be modeled using a binomial mixture distribtion with homogeneous reference populations. Here we show an example of this using the gnomAD V2.1 African/African-American data set, and homogeneous African and European reference panels from 1000 Genomes. We adopt the following model:&lt;/p>
$$
P \left( n \vert N, \Theta \right) = \mbox{Binom} \left( n \bigg\vert N, \sum_{k=1}^{K} \pi_k \theta_k \right)
$$
$$
\ell( \Theta ) = ln \mathcal{L} (\Theta \vert X) = \sum_{i=1}^S ln \left[ \mbox{Binom} \left( n \bigg\vert N, \sum_{k=1}^{K} \pi_k \theta_k \right) \right]
$$
&lt;p>where&lt;/p>
&lt;p>&lt;em>S&lt;/em> is the set of SNPs&lt;/p>
&lt;p>&lt;em>K&lt;/em> are ancestries&lt;/p>
&lt;p>$\pi_k$ are ancestry proportions for &lt;em>k&lt;/em>&lt;/p>
&lt;p>$n_i$ is the Allele Count for that SNP&lt;/p>
&lt;p>$N_i$ is the Allele Number for that SNP&lt;/p>
&lt;p>$\theta_k$ is the Allele Frequency for that SNP&lt;/p>
&lt;p>This leads to the above image which shows a maximation of the log likelihood at the following values:&lt;/p>
&lt;p>&lt;strong>AFR&lt;/strong>: 0.8277273
&lt;strong>EUR&lt;/strong>: 0.1722727&lt;/p>
&lt;p>These values are consistent with known admixture within the gnomAD sample, and are confirmed with other estimation methods (Summix, ADMIXTURE). There are several ways to maximize the log-likelihood including grid-search and Expecation-Maximization algorithms. The binomial distribution can also be inverted and solved using gradient descent methods, such as Sequential Quadratic Programming.&lt;/p></description></item><item><title>EM Algorithm for Mixture Model</title><link>https://www.iansam.com/project/binomial-mixture/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://www.iansam.com/project/binomial-mixture/</guid><description>&lt;p>We can generate a population for a binomial mixture under the following assumptions: That we have a number &lt;em>n&lt;/em> of true cases out of a possible number &lt;em>N&lt;/em> total, and that these have a binomial distribution with $\pi_i$ probability of being from each distribution.&lt;/p>
&lt;p>This is easier to visualize with a &amp;lsquo;coins in a pot&amp;rsquo; example. Say we have a pot filled with two types of coins, each type of coin having its own probability of heads. We pull a number of coins &lt;em>S&lt;/em> from the pot, flip each coin &lt;em>N&lt;/em> times and record the number of &lt;em>n&lt;/em> heads. Under these assumptions, the probability of seeing &lt;em>n&lt;/em> heads for each coin would be:&lt;/p>
$$
P \left( n \vert N, \Theta \right) = \pi_1 \mbox{Binom} \left( n \vert N, \theta_1 \right) + \pi_2 \mbox{Binom} \left( n \vert N, \theta_2 \right)
$$
&lt;p>or more generally for &lt;em>K&lt;/em> different types of coins&lt;/p>
$$
P \left( n \vert N, \Theta \right) = \sum_{k=1}^{K} \pi_k \mbox{Binom} \left( n \vert N, \theta_k \right)
$$
&lt;p>where $\theta_k$ is the probability of heads for each type of coin. Therefor the log-likelihood for our parameters is:&lt;/p>
$$
\mathcal{L} \left( \Theta \vert X, Z \right) = ln P \left( X, Z \vert \Theta \right)
$$
&lt;p>Where $\Theta_k$ represents the set of $\pi_k, \theta_k$, &lt;em>X&lt;/em> represents the set of heads and total coin flips, and &lt;em>Z&lt;/em> represents the set of coin proportions and heads proportions. So the Auxiliary Function is:&lt;/p>
$$
Q(\Theta, \Theta_o) = E \left[ ln \mathcal{L} \left( \Theta \vert X, Z \right) \vert X, \Theta_o \right]
$$
&lt;p>and our expectation is:&lt;/p>
$$
E \left[ ln P \left( X, Z, \vert \Theta \right) \vert X, \Theta_o \right] = \sum_{z_i = 1}^K ln P \left( n_i, z_i \vert \Theta \right) \cdot P \left( z_i \vert n_i, \Theta_o \right)
$$
&lt;p>where&lt;/p>
$$
P \left( z_i = k \vert n_i , \Theta_o \right) = \dfrac{P \left( z_i = k , n_i \vert \Theta_o \right) }{P \left( z_i , n_i \vert \Theta_o \right)} = \dfrac{\pi_{k,o} \mbox{Binom} \left( n_i \vert N_i , \theta_{k,o} \right) }{\sum_{l=1}^{K} \pi_{l,o} \mbox{Binom} \left( n_i \vert N_i , \theta_{l,o} \right) }
$$
&lt;p>We can then use the following expressions to update $\pi$ and $\theta$ until convergence.&lt;/p>
$$
\pi_m = \dfrac{1}{S} \sum_{i=1}^S P \left( z_i = m \vert n_i , \Theta_{o} \right)
$$
$$
\theta_{m,S} = \dfrac{\sum_{i=1}^S n_i \cdot P \left( z_i = m \vert n_i , \Theta_{o} \right)}{\sum_{j=1}^S N_j \cdot P \left( z_j = m \vert n_j , \Theta_{o} \right)}
$$</description></item><item><title>Signal Decomposition of EEG Data</title><link>https://www.iansam.com/project/eeg-data/</link><pubDate>Fri, 01 Nov 2019 00:00:00 +0000</pubDate><guid>https://www.iansam.com/project/eeg-data/</guid><description>&lt;p>Above is a 10 second measurement from an A1-A2 electroencephalogram (EEG) lead, taken from a sleeping patient. We examine this data with a motivating example and application, using a Fast Fourier Transform (FFT), which is a computationally efficient and quick method to decompose sinusoidal signals into their component frequencies. We can show this using a series of signals at 1, 5, 10, 20 and 45 Hz shown below:&lt;/p>
&lt;p>
&lt;figure id="figure-figure-1-each-of-the-5-individual-sinusoidal-signals">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 1 Each of the 5 individual sinusoidal signals" srcset="
/media/sim_signal_indiv_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_24466cf99025690e50cf8465318caf17.webp 400w,
/media/sim_signal_indiv_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_d01654f0c668c7a9a1b799511eb57ba0.webp 760w,
/media/sim_signal_indiv_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://www.iansam.com/media/sim_signal_indiv_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_24466cf99025690e50cf8465318caf17.webp"
width="760"
height="456"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 1. Each of the 5 individual sinusoidal signals.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>These 5 signals can be combined into a signal signal, with a gaussian random sample added to simulate noise. This combined signal can then be put through a finite impulse response (FIR) band-pass (3-30 Hz) filter to further reduce noise and target specific frequencies for signal decomposition.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-2-combined-figures-with-noise-and-band-pass-filter">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 2 Combined figures with noise and band-pass filter" srcset="
/media/sim_signal_combined_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_d1ba38f54371ac4f424e9e8da45c3187.webp 400w,
/media/sim_signal_combined_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_5f6f57fc8b63961a72aa5b7db1e27f35.webp 760w,
/media/sim_signal_combined_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://www.iansam.com/media/sim_signal_combined_hubc5a290e6efdde7478aa29e63aa23fb2_1800192_d1ba38f54371ac4f424e9e8da45c3187.webp"
width="760"
height="456"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 2. Combined figures, with noise and band-pass filter.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Using a multi-taper power spectral density method of estimation, we can show the original frequencies used to create the combined signal, both before and after a band-pass filter is applied.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-3-frequency-estimation-no-fir-filter">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 3 Frequency Estimation no FIR filter" srcset="
/media/sim_psd_hu408e384b049054b93f6d427fea893a62_840192_d47d09b1a6894197f33fee0f2ce4f414.webp 400w,
/media/sim_psd_hu408e384b049054b93f6d427fea893a62_840192_280d08293d35d757033e8127538937fd.webp 760w,
/media/sim_psd_hu408e384b049054b93f6d427fea893a62_840192_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://www.iansam.com/media/sim_psd_hu408e384b049054b93f6d427fea893a62_840192_d47d09b1a6894197f33fee0f2ce4f414.webp"
width="700"
height="400"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3. Frequency Estimation, no FIR filter.
&lt;/figcaption>&lt;/figure>
&lt;figure id="figure-figure-4-frequency-estimation-3-30-hz-fir-filter">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 4 Frequency Estimation 3 to 30 Hz FIR filter" srcset="
/media/sim_psd_fir_hu408e384b049054b93f6d427fea893a62_840192_0aff785b3ebeea93e804b6e21eb1a0d8.webp 400w,
/media/sim_psd_fir_hu408e384b049054b93f6d427fea893a62_840192_a86846ba0a65c58d259f0fe14cff1e3f.webp 760w,
/media/sim_psd_fir_hu408e384b049054b93f6d427fea893a62_840192_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://www.iansam.com/media/sim_psd_fir_hu408e384b049054b93f6d427fea893a62_840192_0aff785b3ebeea93e804b6e21eb1a0d8.webp"
width="700"
height="400"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 4. Frequency Estimation, 3-30 Hz FIR filter.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>We also apply multitaper PSD estimation to our EEG sample. In particular (because the patient was asleep) we focus on the lower frequencies (0.5 Hz - 40 Hz) as we expect there to be a large concentration of lower frequency brain waves.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-5-psd-estimation-of-eeg-lead">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 5 PSD Estimation of EEG Lead" srcset="
/media/psd_eeg_filt_hu408e384b049054b93f6d427fea893a62_840192_da2e023dea0841535eea2a787a4189cc.webp 400w,
/media/psd_eeg_filt_hu408e384b049054b93f6d427fea893a62_840192_1f0002bab5de6b433d73cecd65cc8424.webp 760w,
/media/psd_eeg_filt_hu408e384b049054b93f6d427fea893a62_840192_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://www.iansam.com/media/psd_eeg_filt_hu408e384b049054b93f6d427fea893a62_840192_da2e023dea0841535eea2a787a4189cc.webp"
width="700"
height="400"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 5. PSD Estimation of EEG Lead.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>While this result may seem surprising at first, it does seem consistent with the data and highlights several features. Within EEG data, there are several defined frequency bandwidths which are associated with various states of consciousness. As this patient is asleep, we see high concentrations of Delta (0.5 - 4 Hz) and Theta (4 - 8 Hz) brain waves, with minimal higher brain waves. It also is visually consistent with the plotted EEG lead, as there are large sinusoidal waves in the reading.&lt;/p></description></item></channel></rss>