Package 'NB.MClust'

Title: Negative Binomial Model-Based Clustering
Description: Model-based clustering of high-dimensional non-negative data that follow Generalized Negative Binomial distribution. All functions in this package applies to either continuous or integer data. Correlation between variables are allowed, while samples are assumed to be independent.
Authors: Qian Li [aut, cre]
Maintainer: Qian Li <[email protected]>
License: GPL (>= 2)
Version: 1.1.1
Built: 2024-08-21 03:14:16 UTC
Source: https://github.com/cran/NB.MClust

Help Index


dnb, ldnb Functions

Description

These functions allow you to compute (log-)density of generalized Negative Binomial distribution.

Usage

ldnb(x, theta, mu)

dnb(x, theta, mu)

Arguments

x

A positive numeric scalor or vector. Decimals and integers are both allowed.

theta

Value of dispersion.

mu

Value of mean.

Value

dnb

Density of generalized Negative Binomial

ldnb

Log-density of generalized Negative Binomial

Examples

ldnb(x=10.4,theta=3.2,mu=5)
dnb(x=10.4,theta=3.2,mu=5)

NB.MClust Function

Description

This function performs model-based clustering on positive integer or continuous data that follow Generalized Negative Binomial distribution.

Usage

NB.MClust(Count, K, ini.shift.mu = 0.01, ini.shift.theta = 0.01,
  tau0 = 10, rate = 0.9, bic = TRUE, iteration = 100)

Arguments

Count

Data matrix of discrete counts.This function groups rows of the data matrix.

K

Number of clusters or components specified. It can be a positive integer or a vector of positive integer.

ini.shift.mu

Initial value in EM algorithm for the shift between clusters in mean.

ini.shift.theta

Initial value in EM algorithm for the shift between clusters in dispersion.

tau0

Initial value of anealing rates in EM Algorithm. Default and suggested value is 10.

rate

Stochastic decreasing speed for anealing rate. Default and suggested value is 0.9

bic

Whether Bayesian Information should be computed when K is an integer. BIC is forced to be TRUE when K is a vector.

iteration

Maximum number of iterations in EM Algorithm, default at 50.

Value

parameters

Estimated parameters

$prior

Prior probability that a sample belongs to each cluster

$mu

Mean of each cluster

$theta

Dispersion of each cluster

$posterior

Posterior probability that a sample belongs to each cluster

cluster

Estimated cluster assignment

BIC

Value of Bayesian Information

K

Optional or estimated number of clusters, if input K is a vector

Examples

# Example:

data("Simulated_Count") # A 50x100 integer data frame.

m1=NB.MClust(Simulated_Count,K=2:5)
cluster=m1$cluster #Estimated cluster assignment
k_hat=m1$K  #Estimated optimal K

Data set for illustration: Simulated_Count

Description

Data set for illustration: Simulated_Count

Usage

Simulated_Count

Format

A simulated data frame with 50 rows (i.e. samples) and 100 columns (i.e. variables ). It can be viewed as simulated RNA-Seq integer counts of 100 genes for 50 patients.