ALl Classes + Events

Applied Data Science for Cyber Security

In today’s world we are assailed by ever-increasing amounts of data and increasingly sophisticated attacks. In this 2-day training course participants will use their existing scripting skills and apply data science techniques to analyze data more efficiently and machine learning techniques to keep data and systems more secure.

Through a combination of lecture and exercises, participants will gain a practical understanding of the entire data science process from data preparation, exploratory data analysis, data visualization, machine learning, model evaluation, and implementing at scale — all with a focus on security related problems.

Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks and demonstrate how to apply these techniques to extremely large datasets.

Is This for Me?

This course is for you if:

  • You are a security professional with some scripting skills and you want to apply data science techniques to your work to analyze data more efficiently
  • You are a network analyst with some scripting skills and you want to use machine learning techniques to better secure your network
  • You want to incorporate automated data analysis into your work

Key Takeaways

By the end of this course, students will be confident in their ability to apply this knowledge to extract more value from data while shoring up defenses.

  • Understanding of concepts behind machine learning and how to apply it to security problems
  • Understand the process of transforming raw data into actionable information, specifically preprocessing raw security data for machine learning and feature engineering
  • Quickly and efficiently gather and prepare data for analysis
  • Explore data using basic statistical techniques
  • Create, apply, and evaluate basic machine learning algorithms to identify potential security threats
  • Understand and apply tools such as Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks to large datasets

Want to Learn More?

Join us for a FREE Info Session on June 7 from 5:30pm-6:30pm to meet the instructors and ask questions!

Prerequisites

Class Materials

  • Laptop with Virtualbox (or VMWare) installed, 6GB of RAM and 10GB of storage
  • Instructors will provide a preconfigured virtual machine (VM) containing all the software needed for the class. The VM will also contain:
    • Course slides, notebooks, reference sheets and handouts, documentation
    • Skeleton code examples for in-class exercises

Agenda

Module 1 — Get Data

In this section, you’ll learn how to quickly and efficiently ingest a variety of data types and prepare them for analysis. You’ll also learn the concepts behind vectorized computing.

Introduction: Data Preparation with Pandas

  • What is Pandas and using the Pandas library to quickly manipulate tabular data
  • The Series, DataFrame, and Panel objects
  • The Pandas ecosystem: Scikit-learn, Seaborn, Bokeh

Vectorized Computing in One Dimension: The Series Object

  • Creating a series
  • Describing data
  • Filtering data
  • Other operations on data

Vectorized Computing in Two Dimensions: The DataFrame

  • Creating a DataFrame
  • Reading logfiles, APIs and other sources
  • Manipulating data in data frames
  • Applying functions to data frames
  • Aggregating data in data frames

Module 2 — Explore Your Data

You’ll learn the concepts and techniques behind exploratory data analysis as well as practical data visualization techniques.

Statistical Summaries

  • 5-Number summaries
  • Normalizing data
  • Understanding Distributions
  • Correlations
  • Confidence Intervals and P-Values

Concepts of Data Visualization

  • Creating effective visualizations
  • Choosing the correct visualization
  • Using visualization to explore data

Practical Data Visualization

  • Using Matplotlib to create basic charts
  • Overview of advanced charts with Seaborn
  • Creating dashboards with Superset

Module 3 – Learn From It

Introducing the machine learning process. We will cover model selection, feature engineering, and model evaluation.

Machine Learning Concepts

  • Machine learning process
  • Machine learning problem types
  • Supervised vs Unsupervised machine learning

Unsupervised Machine Learning in Practice

  • Distance measures
  • Nearest Neighbors
  • K-Means

Supervised Machine Learning: Classification

  • Feature engineering
  • Modeling with Decision Trees and Support Vector Machines
  • Model evaluation
  • Case Study: Classifier to identify SQL Injection
  • Project: DGA Classifier

Logistics

This course begins on Thursday, June 22 at 8:30am and will meet for two days from 8:30am-5:30pm.

This course will be held at Betamore’s Light Street campus on the Fourth Floor at 1111 Light Street, Baltimore, MD 21230. Parking + Directions can be found at: bit.ly/betamorepark

Questions?

If you are looking for additional information on course details from the instructor, please don’t hesitate to email education@betamore.com

Reserve Your Spot Now

  • date_range
    Thu, Jun 22nd

    8:30am – 5:30pm

    Instructor
    Photo of Charles Givre
    Charles Givre

    Senior Lead Data Scientist - Strategic Innovation Group at Booz Allen Hamilton

    Mr. Charles Givre CISSP has worked as a Senior Lead Data Scientist for Booz Allen Hamilton for the last six years where he works in the intersection of cyber security and data science.  For the last few years, Mr. Givre worked on one of Booz Allen's largest analytic programs where he led data science efforts and worked to expand the role of data science in the program.
    Mr. Givre is passionate about teaching others data science and analytic skills and has taught data science classes all over the world at conferences, universities and for clients. Most recently, Mr. Givre taught a data science class at the BlackHat conference in Las Vegas and the Center for Research in Applied Cryptography and Cyber Security at Bar Ilan University.   He is a sought-after speaker and has delivered presentations at major industry conferences such as Strata-Hadoop World, BlackHat, Open Data Science Conference and others.
    One of Mr. Givre's research interests is increasing the productivity of data science and analytic teams, and towards that end, he has been working extensively to promote the use of Apache Drill in security applications and has contributed to the code base.  Mr. Givre teaches online classes for O'Reilly about Drill and Security Data Science and is a coauthor for the forthcoming O'Reilly book about Apache Drill.
    Prior to joining Booz Allen, Mr. Givre, worked as a counterterrorism analyst at the Central Intelligence Agency for five years.  Mr. Givre holds a Masters Degree in Middle Eastern Studies from Brandeis University, as well as a Bachelors of Science in Computer Science and a Bachelor's of Music both from the University of Arizona. Mr. Givre holds various Certifications including CISSP, Security+, Network+, Certified Penetration Tester, and CDIA+. He speaks French reasonably well, plays trombone, lives in Baltimore with his family and in his non-existant spare time, is restoring a classic British sports car.  Mr. Givre blogs at thedataist.com and tweets @cgivre.

    Dr. Melissa Kilby, Data Scientist + Cyber Defense Researcher at Booz Allen Hamilton 

    Melissa is passionate about high-performance computing and mathematical modeling. She specializes in advancing automated machine learning and deep learning within the domain of cyber security as well as scaling solutions up to Big Data and performing streaming versus batch analytics. Melissa cofounded GTK Cyber and is an experienced Machine Learning Trainer for cyber security professionals. She has taught courses at BlackHat USA, Booz Allen Hamilton and served as computer science instructor at the University of Georgia for biomechanics doctoral-level courses. Melissa holds a PhD from the University of Georgia.

    At Booz Allen Hamilton she contributes to a variety of cutting edge cyber security research projects. Areas of focus range from Network Intrusion Detection in Industrial Control Systems (SCADA) to memory forensics, network defensibility, EEG Brain Authentication or Cyber Unified Big Data Platforms. Melissa explored a variety of suitable data transformations/projections and signal processing techniques. Her primary focus was on applying hybrid unsupervised - supervised Machine Learning and Deep Learning algorithms to generate data-driven insights, automate processes and develop complete data pipelines. Technologies used include AWS EMR, Hadoop, ElasticSearch, Apache Spark, Python, R, C++, CUDA, TensorFlow, Volatility, VirusTotal, PLCs, TShark, Android, IDAPro, GNURadio, Kali Linux.

    Prior to joining Booz Allen she conducted primary research and coordinated experiments in the 3D motion labs at the University of Georgia and the Pennsylvania State University. Her research in sensorimotor neuroscience applied methods from nonlinear dynamical systems, chaos theory, multivariate statistics, machine learning, robotics and signal processing. She has 6 peer-reviewed publications in high impact journals and presented her research at top conferences such as Neuroscience. Melissa extensively used motion capture technologies and performed real-time biofeedback experiments within Virtual Reality.

    During her summer internship at NASA Johnson Space Center she contributed to ongoing spacesuit engineering efforts such as the development and testing of a next generation of embedded sensor gloves. Melissa contributions was critical to help transition evaluation past laboratory testing into Neutral Buoyancy Facility (NBF) testing. These NBF tests are more complex tests as they are conducted during Astronaut training 40 feet underwater. Technologies used during her PhD include Matlab, R, C++, C#, CUDA, Virtual Reality, wearable technologies, inertial measurement units, force sensitive resistors, force platforms, pressure maps.

    Austin Taylor, Senior Security Researcher at IronNet Cybersecurity

    Austin Taylor is a Cybersecurity enthusiast with a passion for Continuous Monitoring and Hunt Capability. He currently serves as a Cyber Warfare Operator for the United States Air Force and works at IronNet Cybersecurity as a Senior Security Researcher.

Stay Connected

Sign up for our weekly list of classes & events