Week 5 & 6: Stress and determination
This week I continued on the same vein as the other script, but now I was working to expand it to deal with non-phased data in order to look at the genotype of the individual without having to classify them into bins. This was substantially easier said than done, and because of that I may be combining this week with the next week as not much progress was made.
My major accomplishment this week was learning how to use pysam to extract the information that I needed from the VCF files that I had and that made the whole program work substantially smoother. I enjoyed this week even though there was a lot of head banging; because I did learn a lot of new skills.
On monday I managed to finish up writing the script to genotype individuals, but unfortunately I didn’t have a large enough test pool to tell if it was working, however initial signs were good, that was until I tasted NA19240 and it failed. This revealed a flaw that was likely in the reference table, mainly genotypic diversity. My script only really works well on white men, and I need more data points to be able to extrapolate to the rest of the human population.
So I went through the ordeal of getting some sequence data from the NCBI database and unpacking the sra files only to find that the data was lacking the phasing information and thus it couldn’t be assembled properly using phaser and gordian. So I looked on the official 10x website and attempted to get a sequence from there, however whilst this did have phasing information it did not have sufficient depth to be able to cover the NOTCH2NL regions.
So for next week I will continue on to looking at the nonsynonymous changes that have a normal phenotype and attempt to gain some insight from that whilst I am waiting for more data.