ASEB: A Web Server for KAT-specific Acetylation Site Prediction



Introduction introduction.png

News: a new method was provided for the prediction of Class I HDAC substrates. click here

Lysine acetylation is one of the important post-translational modifications of both histone and non-histone proteins. Thousands of acetylated proteins are known. However, few lysine acetylation transferases (KAT) responsible for the acetylation of these proteins have been identified. After analyzing the sequence feature of the acetylated proteins from different KAT families, we found that KAT-catalyzed acetylation should be substrate-specific, similar to kinase-catalyzed phosphorylation (Li et al. 2011). Based on these concepts and using the discovered acetylation proteins, we developed the Acetylation Set Enrichment-Based (ASEB) method to predict the KAT families responsible for a given protein. A total of experimentally validated 280 CBP/p300 and 84 GCN5/PCAF family acetylated lysine sites were manually collected. The ASEB method can predict novel KAT-specific acetylated sites based on the different characteristics of the two aforementioned sets of lysine sites. An introduction to the ASEB method can be found at the Method Section.

Search search.png

Users can search known acetylated sites and responsible KATs for their query proteins by this service. We collected human proteins acetylated by CBP/p300 and GCN5/PCAF KAT families by searching the PubMed literature using keywords. Papers and related references were examined, and the papers with identified acetylation sites and KAT information were selected. The acetylated proteins were extracted and mapped to the UniProt Database to retrieve their Swiss-Prot accession numbers. The acetylated sites were reviewed carefully to ensure that the acetylated position was the exact position mentioned in the literature. Users can search known acetylated sites on specific proteins by the Swiss-Prot accession number or by downloading the dataset directly.

Predict predict.png

This service can predict the acetylation states and responsible KAT families for a protein by inputting the Swiss-Prot accession number or the protein sequence. (A) If the Swiss-Prot accession number is provided, the shortest path between the query protein and each KAT will be shown. Each shortest path is displayed as a Force-Directed Graph with JavaScript library D3. Users can view an example that shows the path between the query protein SMC1B and KAT EP300. (B) For each query protein, this web server uses the ASEB method to assign a P-value for each lysine site. A detailed description of this method can be found in the Method Section. A tutorial for input and output samples may be viewed. (C) Users can use the template script provided for the prediction of various proteins. This script can analyze the query proteins programmatically, rather than through a manual interaction.
Note: The protein-protein interaction view service is only available for human proteins.

Download download.png

Download the template program in Perl to access the web services and parse the output data. download.png
Download the human proteins acetylated by CBP/p300 and GCN5/PCAF collected. download.png
Download the ASEB R package from Bioconductor (>= 2.10). download.png

Details detail.png

Method

During the determination of the shortest path between the query protein and a lysine-acetyl-transferase, the values of the edges in the protein-protein interaction (PPI) network are estimated by the following method. (A) The values of the edges are assigned as one for the PPI edges obtained from the database PINA. (B) The value of each edge is assigned as log(1000-combinedScore(protein1, protein2)) for PPI edges obtained from the database STRING. The combined scores between proteins were queried from the database STRING.
During the prediction of the novel acetylation sites in a KAT-specific way, we used the ASEB method which employs a similar strategy as GSEA (Mootha et al., 2003; Subramanian et al., 2005; Guttman et al., 2009). We focused on finding sites similar in sequence with the discovered ones for each KAT family. We treated the acetylated lysine sites and their surrounding amino acids (eight on each side) as acetylated peptides. Acetylated peptides from two KAT families formed two acetylated peptide sets (CBP/p300 and GCN5/PCAF). For each query, we assigned a P-value according to its similarity with known acetylated ones. The P-values for the query peptides are between 0.0001 and 1, with a minimum interval of 0.0001. The smaller the P-value, the more significant will be the chance that the given peptides were acetylated by the KAT family.

Validation

First, we validated the ASEB method by running the leave-one-out method and estimating the background P-value distribution. The detailed validation results can be found at the prediction page.
Second, we tested the ASEB method on an independent data set of other species, which showed a similar performance as the leave-one-out validation. The detailed validation results can be found at the prediction page.
Third, we conducted biological experiments using an immunoprecipitation assay combined with Western blot, which also demonstrated that ASEB can predict KAT-specific acetylation sites (Li et al. 2011).

Feedback feedback.png

All comments, suggestions, questions, and bug reports are welcome. For inquiries, please send an e-mail to Tingting Li, Ph.D., Peking University Health Science Center via litt@hsc.pku.edu.cn.

Reference

Citation