This is the central Semgrep rule repository that hosts the Semgrep rules for the GitLab semgrep analyzer.
We follow the testing methodology laid out in this blog post.
The repository is structured as illustrated below:
.
├── mappings
│ └── analyzer.yml
├── dist
│ └── pack.yml
├── c
│ ├── buffer
│ │ ├── rule-strcpy.yml
│ │ ├── test-strcpy.c
│ │ ├── rule-memcpy.yml
│ │ └── test-memcpy.c
│ └── ...
└── javascript
│ └── ...
└── python
│ ├── assert
│ │ ├── rule-assert.yml
│ │ └── test-assert.py
│ └── exec
│ │ ├── rule-exec.yml
│ │ ├── test-exec.yml
│ │ ├── rule-something.yml
│ │ └── test-something.yml
│ └── permission
│ │ ├── rule-chmod.yml
│ │ └── test-chmod.py
│ └── ...
└── ...The structure above follows the pattern:
<language>/<ruleclass>/{rule-<rulename>.yml, test-<rulename>\..*} where
language denotes the target programming language, <ruleclass> is a
descriptive name for the class of issues the rule aims to detect and
<rulename> is a descriptive name for the actual rule.
We can have multiple test cases per rule (all prefixed with test-) and rule
files rule-<rulename>.yml that are prefixed with rule-; a rule file
contains a single Semgrep rule.
The mappings and dist directories include the rule-pack configuration which
define the rules that should included into rule-packs and the resulting,
assembled rule-packs.
Please see our update process for more details.
Rules contained in this repository have to adhere to the following format:
- Use
"for strings, otherwise the YAML literal block | - No collapsing of array elements
- max line-length/text-width: 100 characters
- indentation: 2 spaces
- every rule has to have a corresponding test-case
- if provided, comments-section at the top of the rule file
- every YAML files starts with
---
The script ci/autoformat.rb automatically formats/rewrites all the rules files
so that they adhere to our guidelines listed above. It can be
executed by running ci/autoformat.rb within the sast-rules directory after
installing the gems psych yaml fileutils with gem install psych yaml fileutils.
The mappings directory in this repository contains YAML configuration files that map native analyzer ids to the corresponding Semgrep rules. These mappings are digested by the testing framework to perform an automated gap analysis; the goal of this analysis is to check whether there is an unexpected deviation between Semgrep (with the rules in this repository) and a given analyzer.
In addition to that mappings are also used to automatically assemble
rule-packs. The snippet below illustrates an example mapping files for the
bandit analyzer. The native_id section includes some information about the
native id mappings. The actual rule mappings are defined in the mappings
section. Each mapping defines of which Semgrep rules in this repository, a
bandit rules is composed. Note that the order of the rules in the files are
listed does matter at the moment, so that new mappings should be appended at
the end.
bandit:
native_id:
type: "bandit_test_id"
name: "Bandit Test ID: $ID"
value: "$ID"
mappings:
- id: "B301"
rules:
- "python/deserialization/rule-cpickle"
- "python/deserialization/rule-shelve"
- "python/deserialization/rule-pickle"
- "python/deserialization/rule-dill"
- id: "B101"
# ...The rules and test-cases in this repository are partially sourced from the sources listed below:
- https://github.com/returntocorp/semgrep-rules
- https://github.com/PyCQA/bandit
- https://github.com/nodesecurity/eslint-plugin-security
- https://github.com/jsx-eslint/eslint-plugin-react
- https://github.com/david-a-wheeler/flawfinder/blob/master/flawfinder.py
The details are listed in the headers of all the rule end test-files including the licensing information and proper attribution.
If you know about a pattern that isn't present in this repo or refinements that could be applied to the rules in this repository, you can contribute by opening an issue, or even submit an improvement to the rule files/test cases in this repository.
After making changes to rules or mappings, make sure to run ./ci/deploy.sh <semantic version>
and commit your updates to the /dist directory where <semantic version>
should correspond to the latest published version in CHANGELOG.md>
We apply the following semantic versioning scheme to this repository:
- patch version increment: for updated/patched/added rules.
- minor version increment: backwards-compatible YAML schema changes (e.g., adding/removing optional fields).
- major version increment: non-backwards-compatible YAML schema changes (e.g., adding/removing required fields)
We would like to thank the following authors very much for their valuable contributions.
| Author | MRs/Issues |
|---|---|
| @masakura | !99, !107 |
| @gregory.mcdaniel | #32 |
| @niklas.volcz. | !183 |
- B308: django.utils.safestring.mark_safe This rule is basically redundant with B703
- B109: password_config_option_not_marked_secret Not supported anymore since the plugin was removed
- B111: execute_with_run_as_root_equals_true Not supported anymore since the plugin was removed
- B322: input Not supported anymore since the plugin was removed
- B414: import_pycryptodome Not supported anymore since the plugin was removed
- B503: ssl_with_bad_defaults Our Semgrep pattern captures both B503 and B502 because they are very similar and are both practically capturing insecure setting using outdated versions of encryption algorithms.
- B110: try_except_pass The Semgrep rule checks the whole try except block whereas bandit reports every except case. The Semgrep rule approximates the original rule behaviour looking at various permutations of except pass cases embedded in a try ... except block.
- B112: try_except_continue The Semgrep rule checks the whole try except block whereas bandit reports every except case. The Semgrep rule approximates the original rule behaviour looking at various permutations of except continue cases embedded in a try ... except block.
detect-unsafe-regex: Detects potentially unsafe regular expressions, which may take a very long time to run, blocking the event loop: This problem is solved by applying set of conditional logic on each character of a target string. This cannot be accomplished in Semgrep.
- G104: Metavariable types not supported for go at the moment
- G307: Deferring a method which returns an error
Java, Scala
| Rule ID | Description | Status | Comment |
|---|---|---|---|
HARD_CODE_PASSWORD |
Hardcoded Password (Scala) | ❌ | The behaviour is not completely on par with find-sec-bugs; we excluded some patterns that are prone to FPs. |
Out of scope patterns w.r.t. https://gitlab.com/gitlab-org/gitlab/-/issues/354762#rules-with-completion-status are all those patterns that are unrelated to Java.
| Rule ID | Description | Status | Comment |
|---|---|---|---|
PREDICTABLE_RANDOM_SCALA |
Predictable pseudorandom number generator (Scala) | ❌ | Scala not supported |
SCALA_COMMAND_INJECTION |
Potential Command Injection (Scala) | ❌ | Scala not supported |
SCALA_PATH_TRAVERSAL_IN |
Potential Path Traversal using Scala API (file read) | ❌ | Scala not supported |
SCALA_PLAY_SSRF |
Scala Play Server-Side Request Forgery (SSRF) | ❌ | Scala not supported |
SCALA_SENSITIVE_DATA_EXPOSURE |
Potential information leakage in Scala Play | ❌ | Scala not supported |
SCALA_SQL_INJECTION_ANORM |
Potential Scala Anorm Injection | ❌ | Scala not supported |
SCALA_SQL_INJECTION_SLICK |
Potential Scala Slick Injection | ❌ | Scala not supported |
SCALA_XSS_MVC_API |
Potential XSS in Scala MVC API engine | ❌ | Scala not supported |
SCALA_XSS_TWIRL |
Potential XSS in Scala Twirl template engine | ❌ | Scala not supported |
PLAY_UNVALIDATED_REDIRECT |
Unvalidated Redirect (Play Framework) | ❌ | Scala not supported |
ANDROID_BROADCAST |
Broadcast (Android) | ❌ | Android not supported |
ANDROID_EXTERNAL_FILE_ACCESS |
External file access (Android) | ❌ | Android not supported |
ANDROID_GEOLOCATION |
WebView with geolocation activated (Android) | ❌ | Android not supported |
ANDROID_WEB_VIEW_JAVASCRIPT_INTERFACE |
WebView with JavaScript interface (Android) | ❌ | Android not supported |
ANDROID_WEB_VIEW_JAVASCRIPT |
WebView with JavaScript enabled (Android) | ❌ | Android not supported |
ANDROID_WORLD_WRITABLE |
World writable file (Android) | ❌ | Android not supported |
SQL_INJECTION_ANDROID |
Potential Android SQL Injection | ❌ | Android not supported |
GROOVY_SHELL |
Potential code injection when using GroovyShell | ❌ | Groovy not supported |
JSP_INCLUDE |
Dynamic JSP inclusion | ❌ | JSP not supported |
JSP_JSTL_OUT |
Escaping of special XML characters is disabled | ❌ | JSP not supported |
JSP_SPRING_EVAL |
Dynamic variable in Spring expression | ❌ | JSP not supported |
JSP_XSLT |
A malicious XSLT could be provided to the JSP tag | ❌ | JSP not supported |
XSS_JSP_PRINT |
Potential XSS in JSP | ❌ | JSP not supported |
XSS_REQUEST_PARAMETER_TO_JSP_WRITER |
XSS: Servlet reflected cross site scripting vulnerability | ❌ | JSP not supported |
REQUESTDISPATCHER_FILE_DISCLOSURE |
RequestDispatcher File Disclosure | ❌ | JSP not supported |
We excluded the patterns below because they are overly verbose; they are triggered by existing entry-points and do not indicate any vulnerability.
| Rule ID | Description | Status | Comment |
|---|---|---|---|
STRUTS1_ENDPOINT |
Found Struts 1 endpoint | 🚫 | the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them |
STRUTS2_ENDPOINT |
Found Struts 2 endpoint | 🚫 | the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them |
SPRING_ENDPOINT |
Found Spring endpoint | 🚫 | We cannot cope with annotations; in addition endpoints should probably not end up in the final security report anyway |
TAPESTRY_ENDPOINT |
Found Tapestry page | 🚫 | We cannot cope with annotations; in addition endpoints should probably not end up in the final security report anyway. |
JAXRS_ENDPOINT |
Found JAX-RS REST endpoint | 🚫 | the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them |
JAXWS_ENDPOINT |
Found JAX-WS SOAP endpoint | 🚫 | the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them |
HARD_CODE_KEY |
Secret detection rule | 🚫 | Secret Detection is taken care of by a dedicated analyzer |
The patterns below could not be migrated, because they required features not supported by Semgrep. See https://gitlab.com/gitlab-org/gitlab/-/issues/357679 for more information.
| Rule ID | Description | Status | Comment |
|---|---|---|---|
SPRING_CSRF_UNRESTRICTED_REQUEST_MAPPING |
Spring CSRF unrestricted RequestMapping | 🚫 | No support for parsing annotations |
SPRING_UNVALIDATED_REDIRECT |
Spring Unvalidated Redirect | 🚫 | No support for annotations |
WICKET_ENDPOINT |
Found Wicket WebPage | 🚫 | the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them |
UNSAFE_HASH_EQUALS |
Unsafe hash equals | 🚫 | this rule is highly prone to FPs -- it checks for unsecure hash functions by looking for keywords (e.g., sha) in variable or parameter names. As we are already covered by secret detection, we can probably omit this particular rule. |
STATIC_IV |
Static IV | 🚫 | https://gitlab.com/gitlab-org/gitlab/-/issues/357679#note_905023485 |
DESERIALIZATION_GADGET |
This class could be used as deserialization gadget | 🚫 | Multiple logical flows involved. Cannot be achieved in Semgrep. |
ENTITY_LEAK |
Unexpected property leak | 🚫 | Annotations of classes are processed to determine the result. This cannot be achieved in Semgrep. |
ENTITY_MASS_ASSIGNMENT |
Mass assignment | 🚫 | Annotations of classes are processed to determine the result. This cannot be achieved in Semgrep. |
ESAPI_ENCRYPTOR |
Use of ESAPI Encryptor | 🚫 | Config files related. We currently support only files with .java extensions. |
JACKSON_UNSAFE_DESERIALIZATION |
Unsafe Jackson deserialization configuration | 🚫 | Reason |
OBJECT_DESERIALIZATION |
Object deserialization is used | 🚫 | This problem is solved by determining Interface supersets and Annotation metadata. This cannot be accomplished in Semgrep |
REDOS |
Regex DOS (ReDOS) | 🚫 | This problem is solved by applying set of conditional logic on each character of a target string. This cannot be accomplished in Semgrep |
| Rule ID | Description | Comment |
|---|---|---|
SCS0018 |
Path Traversal | We adapted the pattern to not cover arguments passed to Main as sources because this often lead to FPs for CLI apps. |
We excluded the patterns below because they are overly verbose.
| Rule ID | Description | Status | Comment |
|---|---|---|---|
SCS0015 |
Hardcoded Password | 🚫 | This is better served by Secrets Detection as there are a multitude of ways that hardcoded passwords can be specified. |
The patterns below could not be migrated, because they required features not supported by Semgrep.
| Rule ID | Description | Status | Comment |
|---|---|---|---|
SCS0021 |
Request Validation Disabled (Configuration File) | 🚫 | XML configuration file. |
SCS0022 |
Event Validation Disabled | 🚫 | XML configuration file. |
SCS0023 |
View State Not Encrypted | 🚫 | XML configuration file. |
SCS0024 |
View State MAC Disabled | 🚫 | XML configuration file. |
SCS0008 |
Cookie Without SSL Flag | 🚫 | The SCS rule also detects vulnerabilities in ASP.NET config files which is not supported by Semgrep. We also haven't been able to detect these with SCS within the gapanalysis job as the HttpCookie class requires .NET Framework. |
SCS0009 |
Cookie Without HttpOnly Flag | 🚫 | The SCS rule also detects vulnerabilities in ASP.NET config files which is not supported by Semgrep. We also haven't been able to detect these with SCS within the gapanalysis job as the HttpCookie class requires .NET Framework. |
SCS0002 |
SQL Injection | 🚫 | The SCS rule also detects vulnerabilities in ASP.NET UI code, which Semgrep does not support. |
SCS0003 |
XPath Injection | 🚫 | The SCS rule also detects vulnerabilities in ASP.NET UI code, which Semgrep does not support. |
SCS0003 |
XPath Injection | 🚫 | The SCS rule also detects vulnerabilities in ASP.NET UI code, which Semgrep does not support. |
SCS0030 |
Request validation is enabled only for pages (Configuration File) | 🚫 | This rule relates to changes in the Configuration File(XML) format. Semgrep does not have GA support for HTML/XML format. |
Semgrep rules should be kept in-sync with upstream scanners regularly; here's the process:
- Pull the newly added rules from the analyzer's Upstream source (excluding the rules which could not be translated due to Semgrep limitations - see above).
- Translate newly identified rules into Semgrep-equivalent rules
- Map them against native analyzer's IDs in this repository.
- Generate a new ruleset distribution using the instructions described above.
- Add all the un-translatable rules into this file along with the reason against the downstream analyzer/
- Copy over the new ruleset distribution into
Semgrep/rulesto reflect rule changes in the analyzer.
For better tracking purposes, create a dedicated issue on rule synchronization cadence and create a sub-task for each semgrep-translated analyzer. The subtask should contain all the new rules that should be synchronized. Here's an example issue that has followed the mentioned process.