R Reference

Q: What is the <- operator in R and why is it used instead of =?

In R, <- is the primary assignment operator and is conventionally preferred over = for variable assignment. Both <- and = work for top-level assignment, but <- is idiomatic R and avoids ambiguity in function calls where = is used to pass named arguments. For example, f(x = 1) passes x as an argument to f, while x <- 1 assigns 1 to a variable named x in the current environment.

Q: What is the difference between sapply and lapply in R?

lapply always returns a list, even if every element is a scalar. sapply tries to simplify the result: if every element of the output is a scalar of the same type, sapply returns a vector; if they are all vectors of the same length, it returns a matrix. Use lapply when you need a list for further processing, and sapply for interactive use when you want a simpler output. For guaranteed type-safe output in production code, use vapply instead.

Q: What is the difference between base R plot() and ggplot2?

Base R plot() uses a sequential "pen-and-paper" model where you build a plot by calling plot() to create the canvas and then adding elements with lines(), points(), legend(), etc. It is quick for exploratory plotting but awkward for complex multi-layer charts. ggplot2 uses a "grammar of graphics" approach where you declare data mappings (aes), geometric layers (geom_point, geom_line), scales, and themes separately and combine them with +. ggplot2 produces more polished, publication-ready graphics with less code for complex visualizations.

Q: How do I handle errors and warnings in R?

Use tryCatch() to catch errors and warnings without crashing your script. Wrap the expression in a tryCatch block and provide handler functions for error and/or warning conditions. The handler receives a condition object with a $message field. For example: result <- tryCatch({ log(-1) }, warning = function(w) { cat("Warning:", w$message) }, error = function(e) { cat("Error:", e$message) }). Use withCallingHandlers() if you want to handle the condition and continue execution rather than exiting the expression.

Q: How do I create reproducible random samples in R?

Call set.seed(n) before any random number generation to make results reproducible. After set.seed(42), repeated calls to rnorm(), runif(), or sample() will produce the same sequence of numbers every time. rnorm(n, mean, sd) generates normally distributed values, runif(n, min, max) generates uniform values, and sample(x, size, replace) draws a random sample from vector x. The replace argument controls whether sampling is with or without replacement.

Q: What statistical tests are available in base R?

Base R provides a wide range of statistical tests. For comparing means: t.test() for one-sample, two-sample independent, and paired t-tests. For categorical associations: chisq.test() for chi-square tests on contingency tables. For multiple group comparisons: aov() for one-way and two-way ANOVA, followed by TukeyHSD() for post-hoc pairwise comparisons. For correlations: cor() and cor.test(). For non-parametric tests: wilcox.test() (Mann-Whitney), kruskal.test() (Kruskal-Wallis). Linear models use lm() and GLMs use glm().

Free reference guide: R Reference

30 results

About R Reference

The R Language Reference is a comprehensive, searchable cheat sheet for statistical computing and data science with R. It covers 30 entries across six categories: Basics, Vectors, Data Frames, Functions, Visualization, and Statistics. Each entry provides the exact R syntax, a description of what it does, and a complete runnable code example — making it easy to look up the right function or idiom whether you are doing exploratory data analysis, building statistical models, or creating publication-quality visualizations.

This reference is designed for data scientists, statisticians, researchers, and R learners who work with tabular data and statistical models. The Basics section covers R-specific syntax like the <- assignment operator, print/cat/sprintf output, if/else/ifelse conditionals, for/while loops, paste/paste0 string concatenation, and type inspection with class/typeof/str. The Vectors section covers vector creation with c(), sequence generation with seq/rep, positive/negative indexing, vectorized arithmetic, and logical filtering with which/any/all.

The Data Frames section covers data.frame() creation, column and row access with $/[[]]$, filtering with subset() and dplyr's filter(), combining with merge/rbind/cbind, the apply family (apply/sapply/lapply), and the full dplyr pipeline with filter, mutate, group_by, summarise, and arrange. The Visualization section covers base R plot(), ggplot2 with geom_point/geom_smooth/theme_minimal, histogram/barplot/boxplot, and multi-plot layout with par(mfrow). The Statistics section covers summary statistics, t.test, linear regression with lm(), chi-square and ANOVA tests, and random number generation.

Key Features

R basics: <- assignment, print/cat/sprintf, if/else/ifelse, for/while loops, paste/paste0, class/typeof/str type checking
Vector operations: c(), seq/rep, positive and negative indexing, vectorized arithmetic, which/any/all filtering
Data frame manipulation: data.frame(), $/[[]]/ row-column access, subset/filter, merge/rbind/cbind, apply/sapply/lapply
dplyr pipeline: filter, mutate with ifelse, group_by, summarise, arrange(desc()) chaining with %>% pipe operator
Visualization: base R plot() with type/col/labels, ggplot2 aes/geom_point/geom_smooth, hist/barplot/boxplot, par(mfrow) multi-plot
Statistical testing: summary() descriptive stats, t.test (one-sample, two-sample, paired), lm() linear regression with predict()
Advanced statistics: chisq.test, aov() ANOVA with TukeyHSD, rnorm/runif/sample with set.seed for reproducibility
Functions section: function() definition with defaults, anonymous functions (\(x) shorthand for R 4.1+), do.call, tryCatch error handling

Frequently Asked Questions

What is the <- operator in R and why is it used instead of =?

In R, <- is the primary assignment operator and is conventionally preferred over = for variable assignment. Both <- and = work for top-level assignment, but <- is idiomatic R and avoids ambiguity in function calls where = is used to pass named arguments. For example, f(x = 1) passes x as an argument to f, while x <- 1 assigns 1 to a variable named x in the current environment.

What is the difference between sapply and lapply in R?

lapply always returns a list, even if every element is a scalar. sapply tries to simplify the result: if every element of the output is a scalar of the same type, sapply returns a vector; if they are all vectors of the same length, it returns a matrix. Use lapply when you need a list for further processing, and sapply for interactive use when you want a simpler output. For guaranteed type-safe output in production code, use vapply instead.

How does the dplyr pipe (%>%) work and when should I use it?

The %>% pipe operator (from magrittr, re-exported by dplyr) passes the result of the left-hand expression as the first argument of the right-hand function. This allows you to chain operations left-to-right instead of nesting them inside-out. For example, df %>% filter(age > 20) %>% group_by(city) %>% summarise(avg = mean(age)) is much easier to read than the equivalent nested call. R 4.1+ also provides a native |> pipe that works similarly without requiring any package.

How do I perform linear regression in R?

Use the lm() function with a formula: model <- lm(salary ~ age + experience, data = df). The formula syntax y ~ x1 + x2 specifies the response variable (y) and predictors (x1, x2). Call summary(model) to see coefficients, R-squared, F-statistic, and p-values. Use predict(model, newdata = data.frame(age = 35, experience = 10)) to make predictions on new data.

What is the difference between base R plot() and ggplot2?

Base R plot() uses a sequential "pen-and-paper" model where you build a plot by calling plot() to create the canvas and then adding elements with lines(), points(), legend(), etc. It is quick for exploratory plotting but awkward for complex multi-layer charts. ggplot2 uses a "grammar of graphics" approach where you declare data mappings (aes), geometric layers (geom_point, geom_line), scales, and themes separately and combine them with +. ggplot2 produces more polished, publication-ready graphics with less code for complex visualizations.

How do I handle errors and warnings in R?

Use tryCatch() to catch errors and warnings without crashing your script. Wrap the expression in a tryCatch block and provide handler functions for error and/or warning conditions. The handler receives a condition object with a $message field. For example: result <- tryCatch({ log(-1) }, warning = function(w) { cat("Warning:", w$message) }, error = function(e) { cat("Error:", e$message) }). Use withCallingHandlers() if you want to handle the condition and continue execution rather than exiting the expression.

How do I create reproducible random samples in R?

Call set.seed(n) before any random number generation to make results reproducible. After set.seed(42), repeated calls to rnorm(), runif(), or sample() will produce the same sequence of numbers every time. rnorm(n, mean, sd) generates normally distributed values, runif(n, min, max) generates uniform values, and sample(x, size, replace) draws a random sample from vector x. The replace argument controls whether sampling is with or without replacement.

What statistical tests are available in base R?

Base R provides a wide range of statistical tests. For comparing means: t.test() for one-sample, two-sample independent, and paired t-tests. For categorical associations: chisq.test() for chi-square tests on contingency tables. For multiple group comparisons: aov() for one-way and two-way ANOVA, followed by TukeyHSD() for post-hoc pairwise comparisons. For correlations: cor() and cor.test(). For non-parametric tests: wilcox.test() (Mann-Whitney), kruskal.test() (Kruskal-Wallis). Linear models use lm() and GLMs use glm().