Applied Data Science for Cyber Security
In today’s world we are assailed by ever-increasing amounts of data and increasingly sophisticated attacks. In this 2-day training course participants will use their existing scripting skills and apply data science techniques to analyze data more efficiently and machine learning techniques to keep data and systems more secure.
Through a combination of lecture and exercises, participants will gain a practical understanding of the entire data science process from data preparation, exploratory data analysis, data visualization, machine learning, model evaluation, and implementing at scale — all with a focus on security related problems.
Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks and demonstrate how to apply these techniques to extremely large datasets.
Is This for Me?
This course is for you if:
- You are a security professional with some scripting skills and you want to apply data science techniques to your work to analyze data more efficiently
- You are a network analyst with some scripting skills and you want to use machine learning techniques to better secure your network
- You want to incorporate automated data analysis into your work
By the end of this course, students will be confident in their ability to apply this knowledge to extract more value from data while shoring up defenses.
- Understanding of concepts behind machine learning and how to apply it to security problems
- Understand the process of transforming raw data into actionable information, specifically preprocessing raw security data for machine learning and feature engineering
- Quickly and efficiently gather and prepare data for analysis
- Explore data using basic statistical techniques
- Create, apply, and evaluate basic machine learning algorithms to identify potential security threats
- Understand and apply tools such as Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks to large datasets
Want to Learn More?
Join us for a FREE Info Session on June 7 from 5:30pm-6:30pm to meet the instructors and ask questions!RSVP FOR INFO SESSION
- Beginner-to-intermediate experience with the Python programming language
- Familiarity with security and networking concepts
- Laptop with Virtualbox (or VMWare) installed, 6GB of RAM and 10GB of storage
- Instructors will provide a preconfigured virtual machine (VM) containing all the software needed for the class. The VM will also contain:
- Course slides, notebooks, reference sheets and handouts, documentation
- Skeleton code examples for in-class exercises
Module 1 — Get Data
In this section, you’ll learn how to quickly and efficiently ingest a variety of data types and prepare them for analysis. You’ll also learn the concepts behind vectorized computing.
Introduction: Data Preparation with Pandas
- What is Pandas and using the Pandas library to quickly manipulate tabular data
- The Series, DataFrame, and Panel objects
- The Pandas ecosystem: Scikit-learn, Seaborn, Bokeh
Vectorized Computing in One Dimension: The Series Object
- Creating a series
- Describing data
- Filtering data
- Other operations on data
Vectorized Computing in Two Dimensions: The DataFrame
- Creating a DataFrame
- Reading logfiles, APIs and other sources
- Manipulating data in data frames
- Applying functions to data frames
- Aggregating data in data frames
Module 2 — Explore Your Data
You’ll learn the concepts and techniques behind exploratory data analysis as well as practical data visualization techniques.
- 5-Number summaries
- Normalizing data
- Understanding Distributions
- Confidence Intervals and P-Values
Concepts of Data Visualization
- Creating effective visualizations
- Choosing the correct visualization
- Using visualization to explore data
Practical Data Visualization
- Using Matplotlib to create basic charts
- Overview of advanced charts with Seaborn
- Creating dashboards with Superset
Module 3 – Learn From It
Introducing the machine learning process. We will cover model selection, feature engineering, and model evaluation.
Machine Learning Concepts
- Machine learning process
- Machine learning problem types
- Supervised vs Unsupervised machine learning
Unsupervised Machine Learning in Practice
- Distance measures
- Nearest Neighbors
Supervised Machine Learning: Classification
- Feature engineering
- Modeling with Decision Trees and Support Vector Machines
- Model evaluation
- Case Study: Classifier to identify SQL Injection
- Project: DGA Classifier
This course begins on Thursday, June 22 at 8:30am and will meet for two days from 8:30am-5:30pm.
This course will be held at Betamore’s Light Street campus on the Fourth Floor at 1111 Light Street, Baltimore, MD 21230. Parking + Directions can be found at: bit.ly/betamorepark
If you are looking for additional information on course details from the instructor, please don’t hesitate to email firstname.lastname@example.org
Reserve Your Spot Now