iNEXT
(iNterpolation and EXTrapolation) is an R package modified from the original version which was supplied in the Supplement of Chao et al. (2014). In the latest updated version, we have added more user‐friendly features and refined the graphic displays. In this document, we provide a quick introduction demonstrating how to run iNEXT
. Detailed information about iNEXT
functions is provided in the iNEXT Manual, also available in CRAN. See Chao & Jost (2012), Colwell et al. (2012) and Chao et al. (2014) for methodologies. A short review of the theoretical background and a brief description of methods are included in an application paper by Hsieh, Ma & Chao (2016). An online version of iNEXT
(https://chao.shinyapps.io/iNEXT/) is also available for users without an R background.
iNEXT
focuses on three measures of Hill numbers of order q: species richness (q = 0
), Shannon diversity (q = 1
, the exponential of Shannon entropy) and Simpson diversity (q = 2
, the inverse of Simpson concentration). For each diversity measure, iNEXT
uses the observed sample of abundance or incidence data (called the “reference sample”) to compute diversity estimates and the associated 95% confidence intervals for the following two types of rarefaction and extrapolation (R/E):
- Sample‐size‐based R/E sampling curves:
iNEXT
computes diversity estimates for rarefied and extrapolated samples up to an appropriate size. This type of sampling curve plots the diversity estimates with respect to sample size.
- Coverage‐based R/E sampling curves:
iNEXT
computes diversity estimates for rarefied and extrapolated samples with sample completeness (as measured by sample coverage) up to an appropriate coverage. This type of sampling curve plots the diversity estimates with respect to sample coverage.
iNEXT
also plots the above two types of sampling curves and a sample completeness curve. The sample completeness curve provides a bridge between these two types of curves.
SOFTWARE NEEDED TO RUN INEXT IN R
HOW TO RUN INEXT:
The iNEXT
package is available on CRAN and can be downloaded with a standard R installation procedure using the following commands. For a first‐time installation, an additional visualization extension package (ggplot2
) must be loaded.
## install iNEXT package from CRAN
install.packages("iNEXT")
## install the latest version from github
install.packages('devtools')
library(devtools)
install_github('JohnsonHsieh/iNEXT')
## import packages
library(iNEXT)
library(ggplot2)
Remark: In order to install devtools
package, you should update R to the latest version. Also, to get install_github
to work, you should install the httr
package.
MAIN FUNCTION: iNEXT()
We first describe the main function iNEXT()
with default arguments:
iNEXT(x, q=0, datatype="abundance", size=NULL, endpoint=NULL, knots=40, se=TRUE, conf=0.95, nboot=50)
The arguments of this function are briefly described below, and will be explained in more details by illustrative examples in later text. This main function computes diversity estimates of order q, the sample coverage estimates and related statistics for K (if
knots=K
) evenly‐spaced knots (sample sizes) between size 1 and the
endpoint
, where the endpoint is described below. Each knot represents a particular sample size for which diversity estimates will be calculated. By default, endpoint = double the reference sample size (total sample size for abundance data; total sampling units for incidence data). For example, if
endpoint = 10
,
knot = 4
, diversity estimates will be computed for a sequence of samples with sizes (1, 4, 7, 10).
x
|
a matrix , data.frame , lists of species abundances, or lists of incidence frequencies (see data format/information below).
|
q
|
a number or vector specifying the diversity order(s) of Hill numbers.
|
datatype
|
type of input data, “abundance” , “incidence_raw” or “incidence_freq” .
|
size
|
an integer vector of sample sizes for which diversity estimates will be computed. If NULL , then diversity estimates will be calculated for those sample sizes determined by the specified/default endpoint and knots.
|
endpoint
|
an integer specifying the sample size that is the endpoint for R/E calculation; If NULL , then endpoint=double the reference sample size.
|
knots
|
an integer specifying the number of equally‐spaced knots between size 1 and the endpoint.
|
se
|
a logical variable to calculate the bootstrap standard error and conf confidence interval.
|
conf
|
a positive number < 1 specifying the level of confidence interval.
|
nboot
|
an integer specifying the number of bootstrap replications.
|
This function returns an "iNEXT"
object which can be further used to make plots using the function ggiNEXT()
to be described below.
GRAPHIC DISPLAYS: FUNCTION ggiNEXT()
The function ggiNEXT()
, which extends ggplot2
to the "iNEXT"
object with default arguments, is described as follows:
ggiNEXT(x, type=1, se=TRUE, facet.var="none", color.var="site", grey=FALSE)
Here x
is an "iNEXT"
object. Three types of curves are allowed:
Sample-size-based R/E curve (type=1
): see Figs. 1a and 2a in the main text. This curve plots diversity estimates with confidence intervals (if se=TRUE
) as a function of sample size up to double the reference sample size, by default, or a user‐specified endpoint
.
Sample completeness curve (type=2
) with confidence intervals (if se=TRUE
): see Figs. 1b and 2b in the main text. This curve plots the sample coverage with respect to sample size for the same range described in (1).
Coverage-based R/E curve (type=3
): see Figs. 1c and 2c in the main text. This curve plots the diversity estimates with confidence intervals (if se=TRUE
) as a function of sample coverage up to the maximum coverage obtained from the maximum size described in (1).
The argument facet.var=("none", "order", "site" or "both")
is used to create a separate plot for each value of the specified variable. For example, the following code displays a separate plot (in Figs 1a and 1c) for each value of the diversity order q. The user may also use the argument grey=TRUE
to plot black/white figures. The usage of color.var is illustrated in the incidence data example described in later text. The ggiNEXT()
function is a wrapper around ggplot2
package to create a R/E curve using a single line of code. The resulting object is of class "ggplot"
, so can be manipulated using the ggplot2
tools.
out <- iNEXT(spider, q=c(0, 1, 2), datatype="abundance", endpoint=500)
# Sample‐size‐based R/E curves, separating by "site""
ggiNEXT(out, type=1, facet.var="site")
## Not run:
# Sample‐size‐based R/E curves, separating by "order"
ggiNEXT(out, type=1, facet.var="order")
# display black‐white theme
ggiNEXT(out, type=1, facet.var="order", grey=TRUE)
## End(Not run)
The argument facet.var="site"
in ggiNEXT
function creates a separate plot for each site as shown below:
# Sample‐size‐based R/E curves, separating by "site""
ggiNEXT(out, type=1, facet.var="site")

The argument facet.var="order"
and color.var="site"
creates a separate plot for each diversity order site, and within each plot, different colors are used for two sites.
ggiNEXT(out, type=1, facet.var="order", color.var="site")

The following commands return the sample completeness curve in which different colors are used for the two sites:
ggiNEXT(out, type=2, facet.var="none", color.var="site")

The following commands return the coverage‐based R/E sampling curves in which different colors are used for the two sites (facet.var="site"
) and for three orders (facet.var="order"
)
ggiNEXT(out, type=3, facet.var="site")

ggiNEXT(out, type=3, facet.var="order", color.var="site")

INCIDENCE DATA
For illustration, we use the tropical ant data (in the dataset ant included in the package) at five elevations (50m, 500m, 1070m, 1500m, and 2000m) collected by Longino & Colwell (2011) from Costa Rica. The 5 lists of incidence frequencies are shown below. The first entry of each list must be the total number of sampling units, followed by the species incidence frequencies.
data(ant)
str(ant)
List of 5
$ h50m : num [1:228] 599 330 263 236 222 195 186 183 182 129 ...
$ h500m : num [1:242] 230 133 131 123 78 73 65 60 60 56 ...
$ h1070m: num [1:123] 150 99 96 80 74 68 60 54 46 45 ...
$ h1500m: num [1:57] 200 144 113 79 76 74 73 53 50 43 ...
$ h2000m: num [1:15] 200 80 59 34 23 19 15 13 8 8 ...
The argument color.var = ("none", "order", "site" or "both")
is used to display curves in different colors for values of the specified variable. For example, the following code using the argument color.var="site"
displays the sampling curves in different colors for the five sites. Note that theme_bw()
is a ggplot2 function to modify display setting from grey background to black‐and‐white. The following commands return three types R/E sampling curves for ant data.
t <- seq(1, 700, by=10)
out.inc <- iNEXT(ant, q=0, datatype="incidence_freq", size=t)
# Sample‐size‐based R/E curves
ggiNEXT(out.inc, type=1, color.var="site") +
theme_bw(base_size = 18) +
theme(legend.position="none")

# Sample completeness curves
ggiNEXT(out.inc, type=2, color.var="site") +
ylim(c(0.9,1)) +
theme_bw(base_size = 18) +
theme(legend.position="none")
Warning: Removed 15 rows containing missing values (geom_path).

# Coverage‐based R/E curves
ggiNEXT(out.inc, type=3, color.var ="site") +
xlim(c(0.9,1)) +
theme_bw(base_size = 18) +
theme(legend.position="bottom",
legend.title=element_blank())
Warning: Removed 15 rows containing missing values (geom_path).

POINT ESTIMATION FUNCTION: estimateD()
We also supply the function
estimateD(x, datatype="abundance", base="size", level=NULL)
to compute diversity estimates with q = 0, 1, 2 for any particular level of sample size (base="size"
) or any specified level of sample coverage (base="coverage"
) for either abundance data (datatype="abundance"
) or incidence data (datatype="incidence_freq" or "incidence_raw"
). If level=NULL
, this function computes the diversity estimates for the minimum sample size/coverage among all sites.
For example, the following command returns the species diversity with a specified level of sample coverage of 98.5% for the ant data. For some sites, this coverage value corresponds to the rarefaction part whereas the others correspond to extrapolation, as indicated in the method of the output.
estimateD(ant, datatype="incidence_freq",
base="coverage", level=0.985)
t method SC q = 0 q = 1 q = 2
h50m 327 rarefaction 0.9850 197.463 78.051 50.461
h500m 343 extrapolation 0.9850 268.753 103.844 64.759
h1070m 159 extrapolation 0.9850 123.617 59.592 41.775
h1500m 126 rarefaction 0.9850 50.482 26.249 18.649
h2000m 105 rarefaction 0.9851 12.917 7.712 5.795
RAW INCIDENCE DATA FUNCTION: incidence_raw
Note that datatype="incidence_raw"
is a new feature in iNEXT version 2.0.6. We here demonstrate its use via the plant data from three coastal dune habitats. The data set (plant
) included in the package is a list of three matrices; each matrix is a species by plots data.frame. Run the following commands to get the output graphics as shown below.
data(plant)
str(plant)
List of 3
$ Embryo : num [1:46, 1:70] 0 0 0 0 1 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:46] "Ammophila.arenaria" "Anthemis.maritima" "Asparagus.acutifolius" "Bromus.diandrus" ...
.. ..$ : chr [1:70] "cap72 " "cap81N " "m1 " "m10 " ...
$ Mobile : num [1:46, 1:131] 0 1 0 0 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:46] "Ammophila.arenaria" "Anthemis.maritima" "Asparagus.acutifolius" "Bromus.diandrus" ...
.. ..$ : chr [1:131] "cap0N " "cap1 " "cap111 " "cap12n " ...
$ Transition: num [1:46, 1:71] 0 1 0 0 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:46] "Ammophila.arenaria" "Anthemis.maritima" "Asparagus.acutifolius" "Bromus.diandrus" ...
.. ..$ : chr [1:71] "cap107 " "cap27 " "cap33N " "cap41 " ...
out.raw <- iNEXT(plant, datatype="incidence_raw", endpoint=150)
ggiNEXT(out.raw)

Hacking ggiNEXT()
The ggiNEXT()
function is a wrapper around ggplot2
package to create a R/E curve using a single line of code. The resulting object is of class "ggplot"
, so can be manipulated using the ggplot2
tools. The following are some useful examples for customizing graphs.
remove legend
ggiNEXT(out, type=3, facet.var="site") +
theme(legend.position="none")

change to theme and legend.position
ggiNEXT(out, type=1, facet.var="site") +
theme_bw(base_size = 18) +
theme(legend.position="right")

display black‐white theme
ggiNEXT(out, type=1, facet.var="order", grey=TRUE)

free the scale of axis
ggiNEXT(out, type=1, facet.var="order") +
facet_wrap(~order, scales="free")

change the shape of reference sample size
ggiNEXT(out, type=1, facet.var="site") +
scale_shape_manual(values=c(19,19,19))

General customization
The data visualization package ggplot2
provides scale_
function to customize data which is mapped into an aesthetic property of a geom_
. The following functions would help user to customize ggiNEXT
output.
- change point shape:
scale_shape_manual
- change line type :
scale_linetype_manual
- change line color:
scale_colour_manual
- change band color:
scale_fill_manual
see quick reference for style setting.
Example: spider
data
To show how to custmized ggiNEXT
output, we use abundance-based data spider
as an example.
library(iNEXT)
library(ggplot2)
library(gridExtra)
library(grid)
data("spider")
out <- iNEXT(spider, q=0, datatype="abundance")
g <- ggiNEXT(out, type=1, color.var = "site")
g

Change shapes, line types and colors
g1 <- g + scale_shape_manual(values=c(11, 12)) +
scale_linetype_manual(values=c(1,2))
g2 <- g + scale_colour_manual(values=c("red", "blue")) +
scale_fill_manual(values=c("red", "blue"))
# Draw multiple graphical objec on a page
# library(gridExtra)
grid.arrange(g1, g2, ncol=2)

Customize point/line size by hacking
In order to chage the size of reference sample point or rarefaction/extrapolation curve, user need modify ggplot
object.
- change point size:
the reference sample size point is drawn on the first layer by ggiNEXT
. Hacking point size by the following
# point is drawn on the 1st layer, default size is 5
gb3 <- ggplot_build(g)
gb3$data[[1]]$size <- 10
gt3 <- ggplot_gtable(gb3)
# use grid.draw to draw the graphical object
# library(grid)
# grid.draw(gt3)
- change line width (size):
the reference sample size point is drawn on the second layer by ggiNEXT
. Hacking point size by the following
# line is drawn on the 2nd layer, default size is 1.5
gb4 <- ggplot_build(g)
gb4$data[[2]]$size <- 3
gt4 <- ggplot_gtable(gb4)
# grid.draw(gt4)
grid.arrange(gt3, gt4, ncol=2)

Customize theme
A ggplot
object can be themed by adding a theme. User could run help(theme_grey)
to show the default themes in ggplot2
. Further, some extra themes provided by ggthemes
package. Examples shown in the following:
g5 <- g + theme_bw() + theme(legend.position = "bottom")
g6 <- g + theme_classic() + theme(legend.position = "bottom")
grid.arrange(g5, g6, ncol=2)

library(ggthemes)
g7 <- g + theme_hc(bgcolor = "darkunica") +
scale_colour_hc("darkunica")
g8 <- g + theme_economist() + scale_colour_economist()
grid.arrange(g7, g8, ncol=2)

Black-White theme
The following are custmized themes for black-white figure. To modifiy legend, see Cookbook for R for more details.
g9 <- g + theme_bw(base_size = 18) +
scale_fill_grey(start = 0, end = .4) +
scale_colour_grey(start = .2, end = .2) +
theme(legend.position="bottom",
legend.title=element_blank())
g10 <- g + theme_tufte(base_size = 12) +
scale_fill_grey(start = 0, end = .4) +
scale_colour_grey(start = .2, end = .2) +
theme(legend.position="bottom",
legend.title=element_blank())
grid.arrange(g9, g10, ncol=2)

Draw R/E curves by yourself
In iNEXT
, we provide a S3 ggplot2::fortify
method for class iNEXT
. The function fortify
offers a single plotting interface for rarefaction/extrapolation curves. Set argument type = 1, 2, 3
to plot the corresponding rarefaction/extrapolation curves.
df <- fortify(out, type=1)
head(df)
datatype plottype site method order x y y.lwr y.upr
1 abundance 1 Girdled interpolated 0 1 1.000 1.000 1.000
2 abundance 1 Girdled interpolated 0 10 6.479 5.987 6.970
3 abundance 1 Girdled interpolated 0 19 9.450 8.506 10.394
4 abundance 1 Girdled interpolated 0 28 11.514 10.159 12.870
5 abundance 1 Girdled interpolated 0 37 13.127 11.403 14.850
6 abundance 1 Girdled interpolated 0 47 14.622 12.527 16.718
df.point <- df[which(df$method=="observed"),]
df.line <- df[which(df$method!="observed"),]
df.line$method <- factor(df.line$method,
c("interpolated", "extrapolated"),
c("interpolation", "extrapolation"))
ggplot(df, aes(x=x, y=y, colour=site)) +
geom_point(aes(shape=site), size=5, data=df.point) +
geom_line(aes(linetype=method), lwd=1.5, data=df.line) +
geom_ribbon(aes(ymin=y.lwr, ymax=y.upr,
fill=site, colour=NULL), alpha=0.2) +
labs(x="Number of individuals", y="Species diversity") +
theme(legend.position = "bottom",
legend.title=element_blank(),
text=element_text(size=18))

License
The iNEXT package is licensed under the GPLv3. To help refine iNEXT
, your comments or feedbacks would be welcome (please send them to Anne Chao or report an issue on iNEXT github reop).
References
- Chao, A., Gotelli, N.J., Hsieh, T.C., Sander, E.L., Ma, K.H., Colwell, R.K. & Ellison, A.M. (2014) Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84, 45–67.
- Chao, A. & Jost, L. (2012) Coverage‐based rarefaction and extrapolation: standardizing samples by completeness rather than size. Ecology, 93, 2533–2547.
- Colwell, R.K., Chao, A., Gotelli, N.J., Lin, S.‐Y., Mao, C.X., Chazdon, R.L. & Longino, J.T. (2012) Models and estimators linking individual‐based and sample‐based rarefaction, extrapolation and comparison of assemblages. Journal of Plant Ecology, 5, 3–21.
- Ellison, A.M., Barker?Plotkin, A.A., Foster, D.R. & Orwig, D.A. (2010) Experimentally testing the role of foundation species in forests: the Harvard Forest Hemlock Removal Experiment. Methods in Ecology and Evolution, 1, 168–179.
- Hsieh, T.C., Ma, K.H. & Chao, A. (2016) iNEXT: An R package for interpolation and extrapolation of species diversity (Hill numbers). Under revision, Methods in Ecology and Evolution.
- Longino, J.T. & Colwell, R.K. (2011) Density compensation, species composition, and richness of ants on a neotropical elevational gradient. Ecosphere, 2:art29.