# Data Science Short Courses

Short Courses are Data Science Initiative-sponsored events to enable people from across campus to learn about data analytic software.

**08/28/20 Experimental Design – The Key to Reliable & Reproducible Science**

Obtaining reliable and reproducible estimates of a “treatment” effect and drawing conclusions in any scientific context hinges on sound study design. No statistical analysis can salvage a poorly designed study. Before rushing to pick the “ideal” statistical analysis, we must think about the scientific objective and aims and ask: what is the scientific question of interest?

Link

**06/22/20 Data Exploration and Visualization using R and ggplot2**

This course is intended to help people with basic experience using R take their first steps on a data analysis. We provide a quick overview of introductory R programming, then focus on using R and ggplot2 for data exploration and visualization. Note: planned topics for each session may change as time constraints dictate.

Link

**05/16/20-05/17/20 Introduction to Data Analysis with R**

This course provides a brief introduction to the fundamentals of the R language and focuses on its use for data analysis–including exploratory data analysis, linear and logistic regression, variable selection, model diagnostics, and prediction.

Link

**03/08/19 Intro to Deep Generative Models**

This workshop aims at introducing commonly used deep neural networks and their application as deep generative models. We will cover motivating ideas and theory behind various deep generative models such as variational auto-encoder (VAE), generative adversarial network (GAN) and flow-based model. You will learn to implement these models to generate realistic looking images in Tensorflow.

Link

**03/01/19 A Review of Graph Convolutional Neural Networks**

There are numerous real-world data in non-euclidean relations. Finding an optimum representation of these types of data can be useful to investigate their hidden patterns and structures. 2-d manifolds in a 3-d space and graph-embedded relations are two important examples of data points in a non-euclidean relation. Graph convolutional neural networks, as an emerging and surprisingly successful tool, can be used to capture these relations. We cover the mathematics behind this method and provide a survey on the most recent works on GCNs. We also dive into the implementation of one or two basic networks of these types.

Link

**02/22/19 Intro to Linux on the HPC**

This course covers how to best exploit the bash shell for both interactive work and batch jobs, moving & simple manipulation of data, as well very short introductions to programming in bash, Perl, and R. This is not computer science; this is a driver’s license.

Link

**02/15/19 Introduction to Deep Learning**

In this workshop, you’ll learn basic ideas of neural networks and Tensorflow programming fundamentals through building and training different models. Moreover, you will be introduced to more advanced applications of deep learning in computer vision and natural language processing with Keras high-level API.

Link

**11/09/18 An Introduction to Julia**

This workshop aims to introduce both users of scripting languages and advanced programmers to the Julia ecosystem and explore details about the Julia v1.0 language which can help produce efficient and readable code.

The goal of the workshop is for students to understand where Julia can be applied and be well-equipped to start using Julia in their own research. Students will learn about the current state of Julia development (IDEs, documentation, where to get help), how to write efficient code by understanding some of Julia’s internals via small projects, solve problems using advanced Julia features (metaprogramming, multiple-dispatch, etc.), and learn workarounds to common issues newcomers face (scoping problems, type conversions, etc.).

**11/02/18 Topics in R**

Pre-requisites: 1) familiarity with basic statistical concepts, and 2) intermediate R programming knowledge. For the tutorial, bring a laptop with R downloaded and installed and WiFi.

**10/26/18 Intro to Linux on the HPC**

This course covers how to best exploit the bash shell for both interactive work and batch jobs, moving & simple manipulation of data, as well very short introductions to programming in bash, Perl, and R. This is not computer science; this is a driver’s license.

**Intro to R and Data Visualization in R with ggplot**

Intro to R:

In this session, students will be familiarized with R: data types, functions and basic data manipulation including some exploratory data analysis and how to perform statistical tests.

Data Visualization in R with ggplot:

In this part of the workshop, students will learn the basic commands to create statistical plots, understand the grammar of graphics behind ggplot, and master how to create more sophisticated data visualizations through hands-on exercises on real data sets.

**Predictive Modeling with Python**

Python is a popular language for scientific processing and machine learning. This course introduces general modeling concepts in addition to concrete examples based on the scikit-learn library. Example usage of scikit-learn illustrates how to fit and evaluate predictive models. Regression and classification settings will be considered. The course is taught mostly through the medium of iPython notebooks.

**Introduction to Linux Short Course**

This course is for researchers who have never used Linux and/or a compute cluster and introduces concepts and best practices for both. This course covers how to best exploit the bash shell for both interactive work and batch jobs, moving & simple manipulation of data, as well very short introductions to programming in bash, Perl, and R. This is not computer science; this is a driver’s license.

**Analyzing Data/BigData on Linux**

This covers using foreign data formats on Linux, stream processing, using efficient and appropriate file formats, considerations for simple parallel processing, introduction to different families of applications, dealing with Big Data sets.

**Introduction to R**

This course provides an introduction to the fundamentals of the R language.

In this course, students learn how to program in R and how to effectively use R to analyze data. The course covers introduction to data/object types in R, reading data into R, creating data graphics, accessing and installing R packages, writing R functions, fitting statistical models including regression models and performing statical tests as t-test, and ANOVA. Practical examples are provided during the course.

**Software Carpentry Workshop**

This hands-on workshop developed by the Software Carpentry Foundation covers basic concepts and tools, including program design, programming in Python, version control and task automation in the Unix shell. Software Carpentry’s mission is to help graduate students get more research done in less time and with less pain by teaching them basic lab skills for scientific computing.