This page contains all of the best data science and self-development resources that I’ve benefitted from to help you on your path. We’re constantly updating these resources, so make sure you check back often!
This page contains affiliate links. If you purchase a product through one of them, I will receive a commission (at no additional cost to you). Thank you for your support!
OUR PRODUCTS
Practical Guide to Cluster Analysis in R
Provides practical guide to cluster analysis, elegant visualization and interpretation. It covers 1) dissimilarity measures; 2) partitioning clustering methods (K-means, K-Medoids and CLARA algorithms); 3) hierarchical clustering method; 4) clustering validation and evaluation strategies; 5) advanced clustering methods, including: Hierarchical k-means clustering, Fuzzy clustering, Model-based clustering and Density-based clustering.
Practical Guide To Principal Component Methods in R
Provides a solid practical guidance to summarize, visualize and interpret the most important information in a large multivariate data sets, using principal component methods such as PCA (Principal Component Analysis), CA (Simple Correspondence Analysis), MCA (Multiple Correspondence Analysis ) and more.
Machine Learning Essentials: Practical Guide in R
Discovering knowledge from big multivariate data, recorded every days, requires specialized machine learning techniques. This book presents an easy to use practical guide in R to compute the most popular machine learning methods for exploring data sets, as well as, for building predictive models.
R Graphics Essentials for Great Data Visualization
This book provides more than 200 practical examples to create great graphics for the right data using either the ggplot2 package and extensions or the traditional R graphics.
GGPlot2 Essentials for Great Data Visualization in R
This book presents the essentials of ggplot2 package to easily create beautiful graphics in R. Key features: 1) Covers the most important graphic functions; 2) Short, self-contained chapters with practical examples.
Network Analysis and Visualization in R
This book provides a quick start guide to network analysis and visualization in R. You’ll learn, how to create static and interactive network graphs.
Practical Statistics in R for Comparing Groups: Numerical Variables
This R Statistics book provides a solid step-by-step practical guide to statistical inference for comparing groups means using the R software. It is designed to get you doing the statistical tests in R as quick as possible. The book focuses on implementation and understanding of the methods, without having to struggle through pages of mathematical proofs.
Inter-Rater Reliability Essentials: Practical Guide in R
Covers the most common statistical measures for the inter-rater reliability analyses, including cohen’s Kappa, weighted kappa, Light’s kappa , Fleiss kappa, intraclass correlation coefficient and agreement chart.
R Packages
- ggpubr: ggplot2’ Based Publication Ready Plots
- ggcorrplot: Visualization of a Correlation Matrix using GGPlot2
- rstatix: Pipe-friendly Framework for Basic Statistical Tests in R
- factoextra: Extract and Visualize the Results of Multivariate Data Analyses
- survminer: Drawing Survival Curves using ggplot2
- datarium: Data bank for statistical analyses and visualization
DATA SCIENCE BOOKS
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
Introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks.
Data Science for Beginners: 4 Books in 1: Python Programming, Data Analysis, Machine Learning.
Created with the beginner in mind, this powerful bundle delves into the fundamentals behind Python and Data Science, from basic code and concepts to complex Neural Networks and data manipulation. Inside, you’ll discover everything you need to know to get started with Python and Data Science, and begin your journey to success!
Practical Statistics for Data Scientists: 50 Essential Concepts
This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
Hands-On Programming with R: Write Your Own Functions And Simulations
With this book, you’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools.
An Introduction to Statistical Learning: with Applications in R
This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented.
Deep Learning with R
Deep Learning with R introduces the world of deep learning using the powerful Keras library and its R language interface. The book builds your understanding of deep learning through intuitive explanations and practical examples.
Deep Learning with Python
Deep Learning with Python introduces the field of deep learning using the Python language and the powerful Keras library. Written by Keras creator and Google AI researcher François Chollet, this book builds your understanding through intuitive explanations and practical examples.
ONLINE COURSES
Course: Machine Learning: Master the Fundamentals
This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
Specialization: Data Science
Learn Data Science from Johns Hopkins University on Coursera. #1 Specialization on Coursera. Enroll online today! This Specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.
Specialization: Python for Everybody
Master Python in 5 Online Courses from University of Michigan. Enroll today! This Specialization builds on the success of the Python for Everybody course and will introduce fundamental programming concepts including data structures, networked application program interfaces, and databases, using the Python programming language.
Courses: Build Skills for a Top Job in any Industry
Explore hundreds of business courses on Coursera today.
Specialization: Master Machine Learning Fundamentals
Master Machine Learning fundamentals in 5 hands-on courses from University of Washington. Enroll today! This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data.
Specialization: Statistics with R
In this Specialization, you will learn to analyze and visualize data in R and create reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform frequentist and Bayesian statistical inference and modeling to understand natural phenomena and make data-based decisions, communicate statistical results correctly, effectively, and in context without relying on statistical jargon, critique data-based claims and evaluated data-based decisions, and wrangle and visualize data with R packages for data analysis.
You will produce a portfolio of data analysis projects from the Specialization that demonstrates mastery of statistical data analysis from exploratory analysis to inference to modeling, suitable for applying for statistical analysis or data scientist positions.
Specialization: Software Development in R
Master Software Development in R and earn your Specialization Certificate from Coursera and Johns Hopkins University. This Specialization will give you rigorous training in the R language, including the skills for handling complex data, building R packages, and developing custom data visualizations. You’ll be introduced to indispensable R libraries for data manipulation, like tidyverse, and data visualization and graphics, like ggplot2. You’ll learn modern software development practices to build tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers.
Specialization: Genomic Data Science
This Specialization covers the concepts and tools to understand, analyze, and interpret data from next generation sequencing experiments. It teaches the most common tools used in genomic data science including how to use the command line, along with a variety of software implementation tools like Python, R, Bioconductor, and Galaxy.