GenPPI

Ab initio protein interaction network generator

The present work describes the new GenPPI software enhancements on determining homology between protein pairs, which is one of the principal bottlenecks in creating ab initio interaction networks from genomes and the primary step inferring neighborhood and phylogenetic profiles conserved for all genomes under analysis. This improvement was achieved using the Random Forest algorithm, working on biophysical features derived from ten amino acid propensity indexes used to calculate sixty features for each genome's proteins. We crafted a training data set of homolog and non-homolog proteins using nine full proteomes from critical bacteria to classify similar proteins with more than 65% amino acid identities via a machine learning test, an average result obtained from dozens of validations. Such a strategy resulted in more comprehensive and accurate protein interaction networks capable of analyzing genomes of different organisms. Coupled with the significant preprocessing increase, we implemented parallelism in GenPPI to address critical code bottlenecks. Our testing of the new GenPPI improvement using the bacterium Buchenera aphidicola yielded impressive results. We achieved an overlap of 62% with the interactions documented in the STRING, surpassing the previous GenPPI version.

About

GenPPI inspects genomes represented as a multifasta protein looking for a conserved neighborhood, phylogenetic profile, and gene fusion. It allows for transferring the decision of how many and what genomes to use for a protein interaction network construction for the final user.

Download Options (64 bits)

Operational System

Maximum RAM use allowed

Speed/Memory optimization

This version has substantial improvements compared to the version released at the paper's publication time, including: from 20 to 60 classification features. Using -ml: six-fold more proteins, and ninty-fold more interactions included in the final network for our five genomes of Buchnera aphidicola. To use -ml: download and extract the model.7z* files in the same running directory. The -ml is not the default to GENPPI. The executables to all Operation Systems have been updated since the first of February 2024.

model.7z.001

model.7z.002

model.7z.003

First Steps

GenPPI does not have a graphical interface. To proceed after the download, follow the below instructions:

  1. Open a terminal of commands with the command line positioned in the download folder;
  2. Turn on the download file into an executable file with, for instance: chmod 755 genppi;
  3. Type ./genppi for help or follow a step-by-step command-line example.

To get the source code for the program, go to the GenPPI repository on GitHub or through the terminal with the following command:git clone https://github.com/santosardr/genppi.git

Reports

Besides the interaction networks for each genome, GenPPI outputs some textual reports. These textual reports are named 'report.txt' and are under a directory describing the report's name. We created this web interface to upload report files and convert those to beautiful and valuable boxes or columns plottings. Please, for further details, check our help section.


Please, submit a 'report.txt' file created by GENPPI directly from your local disk.

Citation

GENPPI: standalone software for creating protein interaction networks from genomes, BMC Bioinformatics 22, 596 (2021)
William Anjos1, Gabriel Lanes1, Vasco Azevedo2 and Anderson Santos1.


1Computational Biology Laboratory, Faculty of Computing (FACOM), Universidade Federal de Uberlândia, Uberlândia, Minas Gerais, Brazil.
2Department of Genetics, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.

Help

To get more information, visit the GenPPI's readme page.

For technical and scientific problems, contact us by the email of Dr. Anderson Santos.