Skip to content

CUSecLab/SkillScanner

Repository files navigation

SkillScanner: Detecting Policy-Violating Voice Applications Through Static Analysis at the Development Phase

You can find our paper here. If you find our paper useful for you, please consider citing:

  @inproceedings{liao2023skillscanner,  
  title={SkillScanner: Detecting Policy-Violating Voice Applications Through Static Analysis at the Development Phase},
  author={Liao, Song and Cheng, Long and Cai, Haipeng and Guo, Linke and Hu, Hongxin},
  booktitle={Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security},
  pages={2321--2335},
  year={2023}

}

In this repository, the folder "skillscanner" contains the code for skillscanner. "skills_code" contains one example skill code for testing. "user-study" folder includes the data about our user study. "results" folder contains the results in our work based on our dataset. The "image" folder contains some images for presentation. "vscode-codeql-starter" includes CodeQL scripts for analyzing skills.

System Overview

Overview

Results

Results

Usage

Prerequisites

You can download the necessary Python libraries with:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

You need to download the CodeQL from CodeQL to do the skill taint analysis.

In our version, we used the CodeQL version v2.13.1. We ran all our experiments on Ubuntu 16.04 with Python 3.6. The tool might work on similar versions but this was not tested

After downloading and unzipping it, rename it as "codeql-home" and put it in the root path of this repo.

If you want to scan/download skill code datasets from GitHub or analyze skill content/html safety, you need to apply for tokens about GitHub, Google Perspective, and Virustotal. Then you need to put them in the "tokens.txt" file in the "skillscanner" folder.

When you plan to scan a skill, go to the "skillscanner" folder and run with:

python scan_skills.py ../skills_code 1

"1" means there might be several skills in the target folder and "0" means only one skill. Ensure that all the skill files are in one folder.

If you download a new dataset using "clone_repo.py", it will appear in "repo" folder. So the code for analyzing it is:

python scan_skills.py repo 1

The results will be in the folder "skillscanner/results" and each skill will have a folder for storing results. There will also be a report that summarizes all the results of the skill. If a skill has been analyzed before and the report has been generated, the skill will be skipped. The detailed results, such as different issues in code inconsistency will be saved in "results/skillname/code_inconsistency/issuename".

Dataset

To download all the skill code on GitHub platform (updated dataset), you can run:

python search_github.py
python clone_repo.py

After generating a report for each skill, you can run:

python summarize_results.py

And issues of all skills will be in the "summary" folder.

The Github commit hash for the artifact evaluation is 2db238fe56b94750a773dd446cf1babaa03b2a52.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published