R is a programming language and software environment specifically designed for statistical computing, data analysis, and graphical representation. It is widely used in academia, research, and industry for data science and machine learning.
- Statistical Computing: Extensive support for statistical techniques like regression, clustering, and hypothesis testing.
- Data Visualization: Built-in and extensible tools for generating charts, graphs, and plots.
- Comprehensive Package Ecosystem: CRAN (Comprehensive R Archive Network) offers thousands of packages for various applications.
- Cross-Platform: Runs on Windows, macOS, and Linux.
- Open Source: Freely available with an active community.
- Basic Types:
numeric
(e.g., 3.14)integer
(e.g., 5L)character
(e.g., "R")logical
(e.g.,TRUE
,FALSE
)
- Compound Types:
vector
list
matrix
data.frame
factor
- Variables are dynamically typed and can be assigned using
<-
,=
, or->
:x <- 10 y = "R Programming" 20 -> z
- Homogeneous collection of elements:
v <- c(1, 2, 3, 4, 5)
- Heterogeneous collection of elements:
my_list <- list(name = "Alice", age = 25, scores = c(85, 90, 88))
- Two-dimensional data structure with rows and columns:
m <- matrix(1:6, nrow = 2, ncol = 3)
- Tabular data structure where columns can have different types:
df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))
- Used for categorical data:
grades <- factor(c("A", "B", "A", "C"))
- User-defined functions:
add <- function(a, b) { return(a + b) } result <- add(3, 5)
if (x > 0) {
print("Positive")
} else if (x == 0) {
print("Zero")
} else {
print("Negative")
}
- For Loop:
for (i in 1:5) { print(i) }
- While Loop:
while (x < 10) { x <- x + 1 }
- Vectors:
v[2] # Access second element
- Data Frames:
df$name # Access "name" column df[1, ] # Access first row
apply
: Apply a function to rows or columns of a matrix.lapply
andsapply
: Apply functions to elements of a list or vector.- Example:
sapply(1:5, function(x) x^2)
- Summary Statistics:
mean(v) sd(v) summary(df)
- Regression Analysis:
model <- lm(y ~ x, data = df) summary(model)
- Basic plotting:
plot(x, y, type = "p", main = "Scatter Plot")
- Advanced visualization using
ggplot2
:library(ggplot2) ggplot(data = df, aes(x = name, y = age)) + geom_bar(stat = "identity")
- Install a package:
install.packages("ggplot2")
- Load a package:
library(ggplot2)
- Explore available packages on CRAN: https://cran.r-project.org/
- Use meaningful variable names.
- Leverage vectorized operations for performance.
- Use
set.seed()
for reproducibility in random processes. - Modularize code with functions for clarity and reusability.
- CRAN Documentation: https://cran.r-project.org/manuals.html
- RStudio IDE: Integrated Development Environment for R (https://www.rstudio.com/)
- Tidyverse: A collection of R packages for data science (https://www.tidyverse.org/)
R is a powerful tool for statistical computing and data visualization, making it indispensable for data analysts, statisticians, and data scientists.