GitHub - marinaferent/Basics-of-R-programming-language-in-statistical-analysis: Course materials for Basics of R programming language in statistical analysis v.2021

This is a repository for the course Basics of R programming language in statistical analysis. In 2020 and 2021, I instructed the course free of charge to a selected number of UBB FSEGA graduates in collaboration with Multicultural Business Institute. The materials in this repository belong to the 2021 edition.

Course aims

Differentiating between different R objects (vector, matrix, data frame, list) – generating the objects, properties, operations
Visualizing data using base R graphics – bar charts, pie charts, histograms, scatter plots, line charts
Analyzing data using predefined R functions – statistical measures, correlations, linear regression
Writing an R code that uses control structures – for loops, conditional statements
Writing user defined functions – applications on writing the functions for: statistical measures, correlations, linear regression

Upon completion of this course, the participants are independent R users, being able to understand R code, write functions, and debug. Also, they will be able to perform a statistical analysis in R (visualizing, computing statistical measures, performing cross tab analysis, and linear regressions).

Prerequisites: Basic knowledge of descriptive statistics and introductory econometrics terms (statistical population, sample size, statistical variables, mean, median, mode, correlation, linear regression).

Course structure

8 online meetings of 1h 30min each.

Meetings 1-2: Basic notions

A. R STRUCTURES, PROPERTIES AND OPERATIONS: Vectors and Data Frames | Numerical representation of attributive variables:

defining a vector, operations with vectors, vector properties, computing absolute, relative frequencies
defining a matrix, operations with matrices, matrix properties
reading-writing a .csv file, setting a working directory
generating random variables

B. R GRAPHS | Graphical representation of attributive variables:

bar charts, pie charts, histograms

Additionally proposed exercises cover: lists, data extraction, grouped bar charts.

Meetings 3-6: CONTROL STRUCTURES AND FUNCTIONS | Statistical measures

A.FOR LOOPS | Challenge: Mean values

for loops, mean(), colMeans()
compute the mean value of the values of a vector using functions sum() and length()
compute the mean value for each column in a matrix using for loops and the functions sum() and length(), nrow() etc.

B.CONDITIONAL STATEMENTS | Challenge: Median values

conditional statements, median(), matrixStats::colMedians()
compute the median value of the values of a vector using if statements, and length(), sort(), trunc() etc.
compute the median value for each column in a matrix using for loops, if statements, and length(), sort(), trunc() etc.

C.FUNCTIONS | Challenge: Mode values

function()
write a function that returns the mode of the values of a vector along with the text “the mode is”

Additionally proposed exercises cover: variance, standard deviation, coefficient of variation, quartiles, skewness, kurtosis, automatic interpretation of results, data extraction, data normalization, rolling windows.

Meetings 7-8: INDEPENDENT R CODING | Relations between statistical variables

A. Cross-tab analysis

Participants receive a database and have 45 minutes to:

Perform a cross-tab analysis between two quatitative variables (scatter plot, Pearson’s correlation coefficient)
Automate one interpretation/reporting aspect of their cross-tab analysis - this could include, but is not restricted to:
- Export your graph into a .pdf, .png etc. file.
- Based on Pearson’s correlation coefficiend and a conditional statement of one’s choice print: “Positive/Negative/No correlation”; “High/medium/low correlation”; “low positive correlation”, “low negative correlation” etc.; The correlation is 0.24 => low positive correlation between salary and salbegin” etc.
- Save into a matrix and export into a .csv file the value of Pearson’s correlation coefficient and the interpretation
- Add the Pearson’s correlation coefficient value (and the interpretation) on the scatter plot.
- Create an interpretation function.
- Compute the correlation matrix for all the (the quantitative continuous/) variables in the data set.
- Using a for loop, plot the scatter plots of multiple variables.

B. Linear regression model

Participants receive a database and have 45 minutes to:

Run a linear regression and name it regression.
Check one of the assumptions of the linear model: linearity of the model, no perfect or near multicolinearity, homoskedasticity errors, normality of the residuals.
Perform one of the:
- Plot the dependent variable against an independent variable and add the regression line/against all the quantitative independent variables (for loop) and store all the scatter plots into a single .pdf file
- Store the correlation matrix/the regression results into a .csv file locally.
- Interpret the result of the test of the assumptions of the linear model (conditional statement), R-squared, coefficients, a coefficient - if statistically significant etc.

Additionally proposed exercises

For each meeting, I provide a list of proposed extra-exercises for those interested (rBasics_Meeting no._exercises.pdf). These exercises are of 4 types (hopefully supporting many different learning styles):

Comment
Reproduce
Produce
Debug

To cover the diverse backgrounds and interests of the participants, these exercises included short examples of:

Data extraction (from a .pdf file, Yahoo, Eurostat)
Creating sub-datasets with rolling windows
Data pre-processing (normalization)

Acknowledgements

Datasets

Throughout this course I used two data sets (slightly altered to meet the course’s purposes) from Wooldridge, Jeffrey M. (2013). Introductory econometrics: a modern approach. Mason, Ohio: South-Western Cengage Learning, namely:

campus - Campus crimes.csv (Meetings 3-6)
engin - Wages.csv (Meetings 7-8)

The original data sets are available at:

Course structure and coding style

My teaching style is highly influenced by the ones of my excellent professors and colleagues at UBB FSEGA, in particular by professor Cristian Litan. Also, I am grateful to Anna Keresztes and Marcos Dominguez for their positive influence on my coding style while working together.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Meeting 1		Meeting 1
Meeting 2		Meeting 2
Meeting 3		Meeting 3
Meeting 4		Meeting 4
Meeting 5		Meeting 5
Meeting 6		Meeting 6
Meeting 7		Meeting 7
Meeting 8		Meeting 8
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course aims

Course structure

Additionally proposed exercises

Acknowledgements

Datasets

Course structure and coding style

About

Releases

Packages

Languages

marinaferent/Basics-of-R-programming-language-in-statistical-analysis

Folders and files

Latest commit

History

Repository files navigation

Course aims

Course structure

Additionally proposed exercises

Acknowledgements

Datasets

Course structure and coding style

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages