Ab initio protein interaction network generator
The present work describes the new GenPPI software enhancements on determining homology between protein pairs, which is one of the principal bottlenecks in creating ab initio interaction networks from genomes and the primary step inferring neighborhood and phylogenetic profiles conserved for all genomes under analysis. This improvement was achieved using the Random Forest algorithm, working on biophysical features derived from ten amino acid propensity indexes used to calculate sixty features for each genome's proteins. We crafted a training data set of homolog and non-homolog proteins using nine full proteomes from critical bacteria to classify similar proteins with more than 65% amino acid identities via a machine learning test, an average result obtained from dozens of validations. Such a strategy resulted in more comprehensive and accurate protein interaction networks capable of analyzing genomes of different organisms. Coupled with the significant preprocessing increase, we implemented parallelism in GenPPI to address critical code bottlenecks. Our testing of the new GenPPI improvement using the bacterium Buchenera aphidicola yielded impressive results. We achieved an overlap of 62% with the interactions documented in the STRING, surpassing the previous GenPPI version.
GenPPI inspects genomes represented as a multifasta protein looking for a conserved neighborhood, phylogenetic profile, and gene fusion. It allows for transferring the decision of how many and what genomes to use for a protein interaction network construction for the final user.
GenPPI does not have a graphical interface. To proceed after the download, follow the below instructions:
To get the source code for the program, go to the GenPPI repository on GitHub or through the terminal with the following command:git clone https://github.com/santosardr/genppi.git
Besides the interaction networks for each genome, GenPPI outputs some textual reports. These textual reports are named 'report.txt' and are under a directory describing the report's name. We created this web interface to upload report files and convert those to beautiful and valuable boxes or columns plottings. Please, for further details, check our help section.
Please, submit a 'report.txt' file created by GENPPI directly from your local disk.
GENPPI: standalone software for creating protein interaction networks from genomes, BMC Bioinformatics 22, 596 (2021)
William Anjos1, Gabriel Lanes1, Vasco Azevedo2 and Anderson Santos1.
1Computational Biology Laboratory, Faculty of Computing (FACOM), Universidade Federal de Uberlândia, Uberlândia, Minas Gerais, Brazil.
2Department of Genetics, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
To get more information, visit the GenPPI's readme page.
For technical and scientific problems, contact us by the email of Dr. Anderson Santos.