ViralEntropR - A Computational Pipeline for Entropy-Informed Detection of Emerging Viral Variants
Implements an entropy-informed pipeline for detecting emerging variants in viral amino acid sequence data, extending prior clustering-based approaches including hemagglutinin clustering methods (Li et al., 2015) <doi:10.1142/9789814667944_0018>. Provides a fully vectorized FASTA preprocessing toolkit covering header parsing, two-pass date and country extraction, ambiguous-residue filtering, and integer encoding under a 25-symbol amino acid alphabet. Computes per-site Shannon entropy across user-defined cumulative, sliding, or disjoint temporal partitions and clusters per-site entropy values using Gaussian mixture models via 'mclust' (Scrucca et al., 2016) <doi:10.32614/RJ-2016-021>. Quantifies temporal distributional shifts between partitions using the Hellinger distance (van der Vaart, 1998) <doi:10.1017/CBO9780511802256>, and detects temporal change points non-parametrically using energy statistics (Matteson and James, 2014) <doi:10.1080/01621459.2013.849605> via 'ecp' or wild binary segmentation (Fryzlewicz, 2014) <doi:10.1214/14-AOS1245> via 'HDcpDetect'. Per-site amino-acid frequency tables and entropy trajectory plots characterize sequence composition and evolutionary dynamics across time. A configurable multi-variant simulation engine generates synthetic sequence time series with known ground truth for benchmarking detection pipelines. A curated dataset of SARS-CoV-2 Variants of Concern and Variants of Interest with associated lineage and surveillance metadata is included, along with a bundled National Center for Biotechnology Information (NCBI) Spike protein sample and vignettes demonstrating the full workflow.
Last updated
bioinformaticschange-point-detectionclusteringcomputational-biologycovid-19fastafunctional-data-analysisgaussian-mixture-modelsgenomic-surveillancegisaidhellinger-distancemolecular-epidemiologyncbisars-cov-2shannon-entropyspike-proteinvariant-detectionviral-evolutionviral-genomics
5.26 scoreRegrCoeffsExplorer - Efficient Visualization of Regression Coefficients for lm(), glm(), and glmnet() Objects
The visualization tool offers a nuanced understanding of regression dynamics, going beyond traditional per-unit interpretation of continuous variables versus categorical ones. It highlights the impact of unit changes as well as larger shifts like interquartile changes, acknowledging the distribution of empirical data. Furthermore, it generates visualizations depicting alterations in Odds Ratios for predictors across minimum, first quartile, median, third quartile, and maximum values, aiding in comprehending predictor-outcome interplay within empirical data distributions, particularly in logistic regression frameworks.
Last updated
coefficients-of-linear-regressionconfidence-intervalsempirical-dataglmglmnetlasso-regressionlmpostselectioninferenceregression-analysisregularized-linear-regressionregularized-logistic-regressionselectiveinferencestatistics-for-data-sciencevisualization
4.48 score 3 stars 4 scripts 229 downloads