BMC Bioinformatics 2021

This site accompanies the publication "Using sound to understand protein sequence data: new sonification algorithms for protein sequences and multiple sequence alignments" by Martin et. al. 2021 in BMC Bioinformatics.

The paper presents five algorithms for the sonification of protein sequence and multiple sequence alignment (MSA) data. Algorithms I, II, and III sonify protein sequences. Algorithms IV and V sonify multiple sequence alignments.

Some details of the algorithms will be included here, but for a full explanation of the sonifications please see the paper.

Code and documentation is available from GitHub via link below.

Code and documentation - GitHub

Questionnaire

We used a questionaire to research the effectiveness of our sonification methods. We set tasks for our participants to complete using two of the sonification algorithms (Algorithms I and IV). You are welcome to access this questionnaire by the link below and try the tasks - it should take about 15 mins to complete. Download the PDF for clickable links to sound examples. Please note we are not collecting responses.

Access the questionnaire - GitHub

Protein Examples

Major Prion Protein - Homo sapiens (Human)

P04156 (PRIO_HUMAN) protein - UniProt database information

>sp|P04156|PRIO_HUMAN Major prion protein OS=Homo sapiens OX=9606 GN=PRNP PE=1 SV=1
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP
HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA
VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV
NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV
ILLISFLIFLIVG

The example of the Major Human Prion protein demonstrates the effectiveness of sonification in identifying Amino Acid Repeats (AARs).

Transmembrane protein 14C - Homo sapiens (Human)

This is an example of the protein algorithms using a transmembrane protein.

Transmembrane protein - Wikipedia page

Q9P0S9 (TM14C_HUMAN) protein - UniProt database information

>sp|Q9P0S9|TM14C_HUMAN Transmembrane protein 14C OS=Homo sapiens OX=9606 GN=TMEM14C PE=1 SV=1
MQDTGSVVPLHWFGFGYAALVASGGIIGYVKAGSVPSLAAGLLFGSLAGLGAYQLSQDPR
NVWVFLATSGTLAGIMGMRFYHSGKFMPAGLIAGASLLMVAKVGVSMFNRPH

Algorithm 1

Algorithm 2

Algorithm 3

Insulin (globular protein)

Globular protein - Wikipedia page

Insulin - Wikipedia page

P01308 (INS_HUMAN) protein - UniProt database information

>sp|P01308|INS_HUMAN Insulin OS=Homo sapiens OX=9606 GN=INS PE=1 SV=1
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

Algorithm 1

Algorithm 2

Algorithm 3

Histone (Intrinsically Disordered Protein)

Histone H4 - Wikipedia page

Intrinsically disordered proteins - Wikipedia page

P62805 (H4_HUMAN) protein - UniProt database information

IDEAL database: Intrinsically Disordered proteins with Extensive Annotations and Literature

>sp|P62805|H4_HUMAN Histone H4 OS=Homo sapiens OX=9606 GN=H4C1 PE=1 SV=2
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK
VFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG

Algorithm 1

Algorithm 2

Algorithm 3

Multiple Sequence Alignment Examples

For each of the examples below, both a gappy and a compact MSA are given as examples for comparison. These differ in the technique used to make the multiple alignment. Gappy MSAs were generated using MUSCLE 3.8.31 (-gapopen -3). Compact MSAs were generated using MUSCLE 3.8.31 (-gapopen 1). For each pair of gappy and compact MSAs, the same unaligned sequences were used as input.

GAPDH

Glyceraldehyde 3-phosphate dehydrogenase - Wikipedia page

GAPDH Compact Visualisation - AliView (Imgur image)

GAPDH Gappy visualisation - AliView (Imgur image)

Algorithm IV

Algorithm V

Algorithm V