Applied Data Science for Cyber Security

Course Overview

In today’s world we are assailed by ever-increasing amounts of data and increasingly sophisticated attacks. In this 2-day training course participants will use their existing scripting skills and apply data science techniques to analyze data more efficiently and machine learning techniques to keep data and systems more secure.

Through a combination of lecture and exercises, participants will gain a practical understanding of the entire data science process from data preparation, exploratory data analysis, data visualization, machine learning, model evaluation, and implementing at scale — all with a focus on security related problems.

Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks and demonstrate how to apply these techniques to extremely large datasets.

Is This for Me?

This course is for you if:

  • You are a security professional with some scripting skills and you want to apply data science techniques to your work to analyze data more efficiently
  • You are a network analyst with some scripting skills and you want to use machine learning techniques to better secure your network
  • You want to incorporate automated data analysis into your work

Key Takeaways

By the end of this course, students will be confident in their ability to apply this knowledge to extract more value from data while shoring up defenses.

  • Understanding of concepts behind machine learning and how to apply it to security problems
  • Understand the process of transforming raw data into actionable information, specifically preprocessing raw security data for machine learning and feature engineering
  • Quickly and efficiently gather and prepare data for analysis
  • Explore data using basic statistical techniques
  • Create, apply, and evaluate basic machine learning algorithms to identify potential security threats
  • Understand and apply tools such as Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks to large datasets

Want to Learn More?

Join us for a FREE Info Session on June 7 from 5:30pm-6:30pm to meet the instructors and ask questions!

RSVP FOR INFO SESSION

Prerequisites

Class Materials

  • Laptop with Virtualbox (or VMWare) installed, 6GB of RAM and 10GB of storage
  • Instructors will provide a preconfigured virtual machine (VM) containing all the software needed for the class. The VM will also contain:
    • Course slides, notebooks, reference sheets and handouts, documentation
    • Skeleton code examples for in-class exercises

Agenda

Module 1 — Get Data

In this section, you’ll learn how to quickly and efficiently ingest a variety of data types and prepare them for analysis. You’ll also learn the concepts behind vectorized computing.

Introduction: Data Preparation with Pandas

  • What is Pandas and using the Pandas library to quickly manipulate tabular data
  • The Series, DataFrame, and Panel objects
  • The Pandas ecosystem: Scikit-learn, Seaborn, Bokeh

Vectorized Computing in One Dimension: The Series Object

  • Creating a series
  • Describing data
  • Filtering data
  • Other operations on data

Vectorized Computing in Two Dimensions: The DataFrame

  • Creating a DataFrame
  • Reading logfiles, APIs and other sources
  • Manipulating data in data frames
  • Applying functions to data frames
  • Aggregating data in data frames

Module 2 — Explore Your Data

You’ll learn the concepts and techniques behind exploratory data analysis as well as practical data visualization techniques.

Statistical Summaries

  • 5-Number summaries
  • Normalizing data
  • Understanding Distributions
  • Correlations
  • Confidence Intervals and P-Values

Concepts of Data Visualization

  • Creating effective visualizations
  • Choosing the correct visualization
  • Using visualization to explore data

Practical Data Visualization

  • Using Matplotlib to create basic charts
  • Overview of advanced charts with Seaborn
  • Creating dashboards with Superset

Module 3 – Learn From It

Introducing the machine learning process. We will cover model selection, feature engineering, and model evaluation.

Machine Learning Concepts

  • Machine learning process
  • Machine learning problem types
  • Supervised vs Unsupervised machine learning

Unsupervised Machine Learning in Practice

  • Distance measures
  • Nearest Neighbors
  • K-Means

Supervised Machine Learning: Classification

  • Feature engineering
  • Modeling with Decision Trees and Support Vector Machines
  • Model evaluation
  • Case Study: Classifier to identify SQL Injection
  • Project: DGA Classifier

Logistics

This course begins on Thursday, June 22 at 8:30am and will meet for two days from 8:30am-5:30pm.

This course will be held at Betamore’s Light Street campus on the Fourth Floor at 1111 Light Street, Baltimore, MD 21230. Parking + Directions can be found at: bit.ly/betamorepark

Questions?

If you are looking for additional information on course details from the instructor, please don’t hesitate to email education@betamore.com

Reserve Your Spot Now

REFUND POLICY: Please let us know at least 7 days before the scheduled event if you cannot make it by emailing us at registrar@betamore.com. No refunds will be issued within 7 days.

More Learning Opportunities

Check out these other classes and workshops

  • Tech
  • Business
  • Design