4200 Paper Outline Assignment
Identification-Identify a “hypothetical protein” in the genome of an organism of interest
Homology-Using BLAST, identify potential well-characterized homologs
Evolution-Build a phylogenetic tree of appropriate sequences
4200 Paper Outline Assignment
Structure- Analyze predicted secondary and tertiary structure
BUY A CUSTOM WRITTEN PAPER NOW
Expression-Identify potential transcription dynamics
Interactions-Identify potential protein-protein interactions
Pathways-Identify potential involvement of the protein in cellular pathways
Next steps-Design an experiment to test the hypothesis you data suggests
This project will be submitted as both a research paper and presented to the class. The format for the presentation will mirror the paper. The research paper should be 15-20 pages, not including references. The presentation must include screen shots of the actual data and analysis. 4200 Paper Outline Assignment.
4200 Paper Outline Assignment
This isn’t the only way to arrange the information. However it is important to include all of these data.
Citation should be included
Introduction
Basic bioinformatics
Why gene annotation has to have humans plus computers
Quick summary of what we did
Picked a gene of interest. (Fabry Diseases) from (https://www.omim.org/entry/301500?search=stroke&highlight=stroke)
Found hypothetical Protein( the unknown) using kegg (https://www.genome.jp/dbget-bin/www_bget?tng:GSTEN00033862G001)
Used NCBI Taxonomy find my 35 sequence.
Perform Blastx for the 35 sequence for the protein that were similar to the gene of interest and homolog gene
Made phylogenetic tree for all the identical protein
EST for protein expression. 4200 Paper Outline Assignment.
Used hhpred and swiss model for protein structure
Used Biogrid and string to check for the protein interaction
Gene investigated
How you chose the “unknown” or putative protein
We worked backward my known was used to find the unknown
My known gene from OMIM (fabry diseases- GLA gene)
GLA gene was placed on NCBI protein to look for the gene sequence-
>AB019551.1 Homo sapiens GLA mRNA, complete cds
CGGCGGCTGAGAGCTGAAGCTCCCTGGACACTCAAGGCTCTTGTGGTGACAGTCTGACGTAAAGGCGTGC
AGGGAGGCCTAGCTCTGTCTCCTGGACTTAGAGATTTCAGACACAGAAGTCTGTCCATGGCTCCTTGTCA
CATCCGCAAATACCAGGAGAGCGACCGCCAGTGGGTTGTGGGCTTGCTCTCCCGGGGGATGGCCGAGCAT
GCCCCAGCCACCTTCCGGCAATTGCTGAAGCTGCCTCGAACCCTCATACTCTTACTTGGGGGGCCCCTCG
CCCTACTCCTGGTCTCTGGATCCTGGCTTCTAGCCCTCGTGTTCAGCATCAGCCTCTTCCCTGCCCTGTG
GTTCCTTGCCAAAAAACCCTGGACGGAGTATGTGGACATGACATTGTGCACAGACATGTCTGACATTACC
AAATCCTACCTGAGTGAGCGTGGCTCCTGCTTCTGGGTGGCTGAGTCTGAAGAGAAGGTGGTGGGCATGG
TAGGAGCTCTGCCTGTTGATGATCCCACCTTGAGGGAGAAGCGGTTGCAGCTGTTTCATCTCTCTGTGGA
CAGTGAGCACCGTCGTCAGGGGATAGCAAAAGCCCTGGTCAGGACTGTCCTCCAGTTTGCCCGGGACCAG
GGCTACAGTGAAGTTATCCTGGACACCGGCACCATCCAGCTCTCTGCTATGGCCCTCTACCAGAGCATGG
GCTTCAAGAAGACGGGCCAGTCCTTCTTCTGTGTGTGGGCCAGGCTAGTGGCTCTTCATAAAGTTCATTT
CATCTACCACCTCCCTTCTTCTAAGGTAGGGAGTCTGTGATCTCTTTCTGTGTGTATTGGTCAGAATAGA
ATCCATTCAGCTGTAGCAGCAAGCAATCCCCAACCTTTCACTGCAATGACCTTTCAATGCCCG
Translate the nucleotide to protein
>rf 1 AB019551.1 Homo sapiens GLA mRNA, complete cds
RRLRAEAPWTLKALVVTV*RKGVQGGLALSPGLRDFRHRSLSMAPCHIRKYQESDRQWVV
GLLSRGMAEHAPATFRQLLKLPRTLILLLGGPLALLLVSGSWLLALVFSISLFPALWFLA
KKPWTEYVDMTLCTDMSDITKSYLSERGSCFWVAESEEKVVGMVGALPVDDPTLREKRLQ
LFHLSVDSEHRRQGIAKALVRTVLQFARDQGYSEVILDTGTIQLSAMALYQSMGFKKTGQ
SFFCVWARLVALHKVHFIYHLPSSKVGSL*SLSVCIGQNRIHSAVAASNPQPFTAMTFQC
P
Kegg was used to find the unknown (https://www.genome.jp/dbget-bin/www_bget?tng:GSTEN00033862G001)
put on kegg and I found on
What is the homolog or “known”
What does the literature say it does (summarize 3 important papers on this)- discuss in detail- (https://www.omim.org/entry/301500?search=stroke&highlight=stroke)
Fabry diseases what is it , what the gene does etc. talk in details, Talk about the crystal structure ,protein structure, protein interaction etc
Evolution
Describe alignments -(MSA alignment from the textbook -already provided under upload file)
What are the strengths and weaknesses of the programs we used
Describe how you chose your 35 sequences (including any problems along the way)-I did my 35 sequence three times, taxonomy didn’t work for me because the information doesn’t match when I put the Multiple alignment sequence it shows that all my sequence are different. I also used NCBI Protein to look for my unknown protein I search N-acetyltransferase 8 (pan Paniscus) and looked for only eukaryote, that didn’t work as well I had the same problem putting it on MSA , the sequence doesn’t match. I finally used generous software to delete some of the sequence, that also didn’t work I made generous blast it for me that also didn’t work , I had to go back to NCBI protein and now search for just N-acetyltransferase and then I put the 35 sewuence I found from NCBI on generous, cut some sequence our and then I have a perfect sequence I used to make a tree. The original sequence was put on MSA alignment for Clustal omega and Muscle. 4200 Paper Outline Assignment.
Describe your alignment results (what is conserved and what is variable)
Does this support the idea that your putative functions like the homolog?- describe the alignment of both MSA sequence Clustal and Muscle
Clustal Mview ( File will be send separately)
T-coffee( file will be send separately)
Describe the different phylogeny programs we used, strengths and weaknesses. I used jukes cantor on Geneious and also baysian and parsimony- talk about the strength and weaknesss of the three
Describe your trees
Does this support the idea that your putative functions like the homolog? Anwer this question in detail and conparing this three tree together
Tree from genious
Talk about thes tree. Extraction of the sequence was used to make the tree. the smallest sequence was used as an outgroup. This tree should be compare to baysian 28 sequence(no extraction) tree below. Also compare baysian (no extraction to pasimony no extraction) (4 pages) 4200 Paper Outline Assignment.
Baysian – http://www.phylogeny.fr/get_result.cgi?task_id=f7bd2aa365280f4af085146e73aedff5&results_in_list=7&raw=1&file=phylo_tree.pdf
Parsimony tree
Expression
Describe how expression data is collected
GLA was placed into NCBI unigene then EST profile and Graph was chosen from GLA Homosapien link gene profile. Talk about the graph in details
Describe the expression data you got for your homolog- Please use link for more details about the gene expression and talk in details , used some research paper to back it up -Please cite https://www.ncbi.nlm.nih.gov/gene?linkname=protein_gene&from_uid=4504009
https://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?UGID=139289&TAXID=9606&SEARCH=
after talking about expression, also talk about induction of the GLA gene https://www.ncbi.nlm.nih.gov/geoprofiles/127641979
Structure
Describe how the structure prediction programs we used work- Describe Swiss model program and hhpred-
Describe your results (regions that match and don’t and what those might do to the function.)
Does this support the idea that your putative functions like the homolog?
Explaining Protein structure of Swiss model and hhpred and comparing the real sequence 3tts to the computer suggestion, what are the different
Do sepratly for hhpred and swiss model
(MQVVIRKYRPSDKEAACGLFSTGILGHIYPCFCHTMTSPLYIIITMALSAAGFLLGSVLG
ALVLPGIWVGLIYYCCHELFSSFVRGQLQSDMQDISRSYLSRPDDCFWVAEAEVGGTSQI
VGMVAVVGKHSAGKRQGELFRMIISPLFRRMGLGARLTQTVIDFCKDSGFSEVELETSTT
QAAAVALYMKLGFHVALSHRNTHAPYWIIMLSKVVIRKYRPSDKEAVCSLFSTGILEHIY
PCFRNAMTSPLYIIITMALSAAGFLLGSVLGALVLPGIWVGLIYYCCHELYSSFVRGQLQ
SDMQDISRSYLSRPDDCFWVAEAEVGGTSQIVGTAAVLANQSGGVKQGELRRLSISPLFR
RKGLGSRLTQTVTEFCEERGFSELVLQTSASRTAAVNLYKNLGFYVVLVVMQVVIRKYRP
SDKEAACGLFSTGILGHIYPCFCHTMTSPLYIIITMALSAAGFLLGSTLVESPPQFFCNQ
A) from keg unknown
https://swissmodel.expasy.org/interactive/JKkCQG/models/
https://swissmodel.expasy.org/templates/3tt2.1
Hhpred
https://www.rcsb.org/pdb/protein/D1C2H7?addPDB=3TT2
Interactions
Describe different databases we used and how they get their data- (string and biogrid was used)
Describe the interactions your putative would engage in if it functions like the homolog.
Write about protein interaction of this two
Protein interaction- Explaining Protein interaction of String- GLA gene and biogrid and comparing both….. unknown protein sequence
https://string-db.org/cgi/network.pl?taskId=pcTCS3rAlfdC
https://thebiogrid.org/108981/summary/homo-sapiens/gla.html
Summary and experimental
Summarize your data above in a paragraph or two
In order to determine if my putative functions like the homology the following experiment should be conducted. 4200 Paper Outline Assignment.