Skip to content

CrystalCo/vulnerability_detection

Repository files navigation

Setup

To download this repository, clone it using git: git clone https://github.com/CrystalCo/vulnerability_detection.git

Requires Python 3.8.10

Step 0

If the files in slicesSource are not compressed, you may skip this step. Otherwise do the following: - Download git lfs at https://git-lfs.github.com/ - Install by running git lfs install. - Run git lfs pull to unzip the large files.

Step 1

Create an ENV variable with the path to this project, and call it VUL_PATH.

Example for Unix/MacOS:

export VUL_PATH=`pwd`

or manually:

export VUL_PATH=/Users/cryst/Documents/vulnerability_detection

Step 2

Create a virtual environment:

python3 -m virtualenv env

Activate virtual environment:

source env/bin/activate

Install requirements:

pip install -r requirements.txt 

Step 3

Make sure the following folders are inside the root directory:

  • w2vModel/metrics/
  • w2vModel/metrics/bgru
  • w2vModel/metrics/blstm
  • w2vModel/model/
  • model/

Make sure the following folders are inside the data directory:

  • CVE/
  • CWE/
  • DLinputs/
  • DLinputs/
  • DLvectors/
  • DLvectors/
  • DLvectors/
  • DLvectors/
  • slicesSource/
  • token/
  • tokens/

0_SYSE_source2slice

Original code that converts source code to slices.

SYSE_1_isVulnerable

Contains the original source code for binary vulnerability detection. 2_Application_Codes.ipynb is the main file to run in this folder. It uses BGRU & BLSTM to detect whether a slice of code contains a vulnerability or not.

SYSE_2_vulnerabilityType

Contains the follow up code that attempts the multiclass classification of vulnerabilities across 162 Common Weakness Enumeration (CWE) IDs.

CWE_Data_Preprocessing

CWE_Data_Preprocessing.ipynb was the first step in preprocessing the data. It collects the SARD & CVE test case IDs for all the source slices we have. Then, it scrapes the Internet for the CWE attributes for each SARD & CVE ID. Finally, it ouputs 2 files: CWE_DF.csv & CVE_DF.csv. CWE_DF.csv contains all the unique CWE IDs, their details, and counts*. CVE_DF.csv contains all the unique CVE IDs, their descriptions & counts*, and the CWE-ID associated with them if applicable.

*number of times they appear in the source code file.

Grouping_By_Abstraction

Grouping_By_Abstraction.ipynb then collects all the CWE IDs found in the previous step, and creates a tree of relationships between these CWEs. CWEs were grouped by similarity, which are defined as pillars in the Research Concepts view in the CWE website. A dictionary of SARD & CVE IDs mapped to their respective group ID is created, and saved to SARD_CVE_to_groups.csv.

Grouping_By_CWE

Grouping_By_CWE.ipynb CWEs grouped by their unique CWE-ID. A dictionary of the original SARD & CVE IDs mapped to their respective CWE-ID is then saved to SARD_CVE_to_CWE.csv.

2A_Vulnerability_Classification_ML

3A_Vulnerability_Classification_ML.ipynb attempts to classify vulnerability types using ML models.

2B_Vulnerability_Classification_ML_PCA

3A_Vulnerability_Classification_ML_PCA.ipynb attempts to classify vulnerability types using ML models with PCA transformed data.

2C_Vulnerability_Classification_DL

3B_Vulnerability_Classification_DL.ipynb attempts to classify vulnerability types using DL models.

2D_Vulnerability_Classification_DL_PCA

3A_Vulnerability_Classification_DL_PCA.ipynb attempts to classify vulnerability types using DL models with PCA transformed data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published