Skip to content

Latest commit

 

History

History
218 lines (161 loc) · 4.06 KB

File metadata and controls

218 lines (161 loc) · 4.06 KB

Summary of R Documentation

Overview

R is a programming language and software environment specifically designed for statistical computing, data analysis, and graphical representation. It is widely used in academia, research, and industry for data science and machine learning.


Key Features

  • Statistical Computing: Extensive support for statistical techniques like regression, clustering, and hypothesis testing.
  • Data Visualization: Built-in and extensible tools for generating charts, graphs, and plots.
  • Comprehensive Package Ecosystem: CRAN (Comprehensive R Archive Network) offers thousands of packages for various applications.
  • Cross-Platform: Runs on Windows, macOS, and Linux.
  • Open Source: Freely available with an active community.

Core Concepts

Data Types

  • Basic Types:
    • numeric (e.g., 3.14)
    • integer (e.g., 5L)
    • character (e.g., "R")
    • logical (e.g., TRUE, FALSE)
  • Compound Types:
    • vector
    • list
    • matrix
    • data.frame
    • factor

Variables

  • Variables are dynamically typed and can be assigned using <-, =, or ->:
    x <- 10
    y = "R Programming"
    20 -> z

Data Structures

Vectors

  • Homogeneous collection of elements:
    v <- c(1, 2, 3, 4, 5)

Lists

  • Heterogeneous collection of elements:
    my_list <- list(name = "Alice", age = 25, scores = c(85, 90, 88))

Matrices

  • Two-dimensional data structure with rows and columns:
    m <- matrix(1:6, nrow = 2, ncol = 3)

Data Frames

  • Tabular data structure where columns can have different types:
    df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))

Factors

  • Used for categorical data:
    grades <- factor(c("A", "B", "A", "C"))

Functions

  • User-defined functions:
    add <- function(a, b) {
        return(a + b)
    }
    result <- add(3, 5)

Control Structures

Conditionals

if (x > 0) {
    print("Positive")
} else if (x == 0) {
    print("Zero")
} else {
    print("Negative")
}

Loops

  • For Loop:
    for (i in 1:5) {
        print(i)
    }
  • While Loop:
    while (x < 10) {
        x <- x + 1
    }

Data Manipulation

Subsetting

  • Vectors:
    v[2]  # Access second element
  • Data Frames:
    df$name  # Access "name" column
    df[1, ]  # Access first row

Apply Functions

  • apply: Apply a function to rows or columns of a matrix.
  • lapply and sapply: Apply functions to elements of a list or vector.
  • Example:
    sapply(1:5, function(x) x^2)

Statistical Analysis

  • Summary Statistics:
    mean(v)
    sd(v)
    summary(df)
  • Regression Analysis:
    model <- lm(y ~ x, data = df)
    summary(model)

Graphics and Visualization

Base Graphics

  • Basic plotting:
    plot(x, y, type = "p", main = "Scatter Plot")

ggplot2

  • Advanced visualization using ggplot2:
    library(ggplot2)
    ggplot(data = df, aes(x = name, y = age)) +
        geom_bar(stat = "identity")

Packages

  • Install a package:
    install.packages("ggplot2")
  • Load a package:
    library(ggplot2)
  • Explore available packages on CRAN: https://cran.r-project.org/

Best Practices

  • Use meaningful variable names.
  • Leverage vectorized operations for performance.
  • Use set.seed() for reproducibility in random processes.
  • Modularize code with functions for clarity and reusability.

Resources

R is a powerful tool for statistical computing and data visualization, making it indispensable for data analysts, statisticians, and data scientists.