Semgrep rules

This is the central Semgrep rule repository that hosts the Semgrep rules for the GitLab semgrep analyzer.

We follow the testing methodology laid out in this blog post.

The repository is structured as illustrated below:

.
├── mappings
│   └── analyzer.yml
├── dist
│   └── pack.yml
├── c
│   ├── buffer
│   │   ├── rule-strcpy.yml
│   │   ├── test-strcpy.c
│   │   ├── rule-memcpy.yml
│   │   └── test-memcpy.c
│   └── ...
└── javascript
│   └── ...
└── python
│    ├── assert
│    │   ├── rule-assert.yml
│    │   └── test-assert.py
│    └── exec
│    │   ├── rule-exec.yml
│    │   ├── test-exec.yml
│    │   ├── rule-something.yml
│    │   └── test-something.yml
│    └── permission
│    │   ├── rule-chmod.yml
│    │   └── test-chmod.py
│    └── ...
└── ...

The structure above follows the pattern: <language>/<ruleclass>/{rule-<rulename>.yml, test-<rulename>\..*} where language denotes the target programming language, <ruleclass> is a descriptive name for the class of issues the rule aims to detect and <rulename> is a descriptive name for the actual rule.

We can have multiple test cases per rule (all prefixed with test-) and rule files rule-<rulename>.yml that are prefixed with rule-; a rule file contains a single Semgrep rule.

The mappings and dist directories include the rule-pack configuration which define the rules that should included into rule-packs and the resulting, assembled rule-packs.

Updating rules

Please see our update process for more details.

Formatting guidelines

Rules contained in this repository have to adhere to the following format:

Use " for strings, otherwise the YAML literal block |
No collapsing of array elements
max line-length/text-width: 100 characters
indentation: 2 spaces
every rule has to have a corresponding test-case
if provided, comments-section at the top of the rule file
every YAML files starts with ---

The script ci/autoformat.rb automatically formats/rewrites all the rules files so that they adhere to our guidelines listed above. It can be executed by running ci/autoformat.rb within the sast-rules directory after installing the gems psych yaml fileutils with gem install psych yaml fileutils.

Mappings

The mappings directory in this repository contains YAML configuration files that map native analyzer ids to the corresponding Semgrep rules. These mappings are digested by the testing framework to perform an automated gap analysis; the goal of this analysis is to check whether there is an unexpected deviation between Semgrep (with the rules in this repository) and a given analyzer.

In addition to that mappings are also used to automatically assemble rule-packs. The snippet below illustrates an example mapping files for the bandit analyzer. The native_id section includes some information about the native id mappings. The actual rule mappings are defined in the mappings section. Each mapping defines of which Semgrep rules in this repository, a bandit rules is composed. Note that the order of the rules in the files are listed does matter at the moment, so that new mappings should be appended at the end.

bandit:
  native_id:
    type: "bandit_test_id"
    name: "Bandit Test ID: $ID"
    value: "$ID"
  mappings:
  - id: "B301"
    rules:
      - "python/deserialization/rule-cpickle"
      - "python/deserialization/rule-shelve"
      - "python/deserialization/rule-pickle"
      - "python/deserialization/rule-dill"
  - id: "B101"
  # ...

Data sources

The rules and test-cases in this repository are partially sourced from the sources listed below:

The details are listed in the headers of all the rule end test-files including the licensing information and proper attribution.

Contributing

If you know about a pattern that isn't present in this repo or refinements that could be applied to the rules in this repository, you can contribute by opening an issue, or even submit an improvement to the rule files/test cases in this repository.

Contribution instructions

After making changes to rules or mappings, make sure to run ./ci/deploy.sh <semantic version> and commit your updates to the /dist directory where <semantic version> should correspond to the latest published version in CHANGELOG.md>

Versioning and Changelog

We apply the following semantic versioning scheme to this repository:

patch version increment: for updated/patched/added rules.
minor version increment: backwards-compatible YAML schema changes (e.g., adding/removing optional fields).
major version increment: non-backwards-compatible YAML schema changes (e.g., adding/removing required fields)

Credits

We would like to thank the following authors very much for their valuable contributions.

Author	MRs/Issues
@masakura	!99, !107
@gregory.mcdaniel	#32
@niklas.volcz.	!183

Rule deployment

Rules that are not covered at the moment

Bandit

Excluded patterns (1)

B308: django.utils.safestring.mark_safe This rule is basically redundant with B703
B109: password_config_option_not_marked_secret Not supported anymore since the plugin was removed
B111: execute_with_run_as_root_equals_true Not supported anymore since the plugin was removed
B322: input Not supported anymore since the plugin was removed
B414: import_pycryptodome Not supported anymore since the plugin was removed

Adjusted patterns (3)

B503: ssl_with_bad_defaults Our Semgrep pattern captures both B503 and B502 because they are very similar and are both practically capturing insecure setting using outdated versions of encryption algorithms.
B110: try_except_pass The Semgrep rule checks the whole try except block whereas bandit reports every except case. The Semgrep rule approximates the original rule behaviour looking at various permutations of except pass cases embedded in a try ... except block.
B112: try_except_continue The Semgrep rule checks the whole try except block whereas bandit reports every except case. The Semgrep rule approximates the original rule behaviour looking at various permutations of except continue cases embedded in a try ... except block.

ESLint

Patterns we were unable to migrate (1)

detect-unsafe-regex: Detects potentially unsafe regular expressions, which may take a very long time to run, blocking the event loop: This problem is solved by applying set of conditional logic on each character of a target string. This cannot be accomplished in Semgrep.

Gosec

Patterns we were unable to migrate (2)

find-sec-bugs

Java, Scala

Adjusted patterns

Rule ID	Description	Status	Comment
`HARD_CODE_PASSWORD`	Hardcoded Password (Scala)	❌	The behaviour is not completely on par with find-sec-bugs; we excluded some patterns that are prone to FPs.

Out of scope patterns (25)

Out of scope patterns w.r.t. https://gitlab.com/gitlab-org/gitlab/-/issues/354762#rules-with-completion-status are all those patterns that are unrelated to Java.

Rule ID	Description	Status	Comment
`PREDICTABLE_RANDOM_SCALA`	Predictable pseudorandom number generator (Scala)	❌	Scala not supported
`SCALA_COMMAND_INJECTION`	Potential Command Injection (Scala)	❌	Scala not supported
`SCALA_PATH_TRAVERSAL_IN`	Potential Path Traversal using Scala API (file read)	❌	Scala not supported
`SCALA_PLAY_SSRF`	Scala Play Server-Side Request Forgery (SSRF)	❌	Scala not supported
`SCALA_SENSITIVE_DATA_EXPOSURE`	Potential information leakage in Scala Play	❌	Scala not supported
`SCALA_SQL_INJECTION_ANORM`	Potential Scala Anorm Injection	❌	Scala not supported
`SCALA_SQL_INJECTION_SLICK`	Potential Scala Slick Injection	❌	Scala not supported
`SCALA_XSS_MVC_API`	Potential XSS in Scala MVC API engine	❌	Scala not supported
`SCALA_XSS_TWIRL`	Potential XSS in Scala Twirl template engine	❌	Scala not supported
`PLAY_UNVALIDATED_REDIRECT`	Unvalidated Redirect (Play Framework)	❌	Scala not supported
`ANDROID_BROADCAST`	Broadcast (Android)	❌	Android not supported
`ANDROID_EXTERNAL_FILE_ACCESS`	External file access (Android)	❌	Android not supported
`ANDROID_GEOLOCATION`	WebView with geolocation activated (Android)	❌	Android not supported
`ANDROID_WEB_VIEW_JAVASCRIPT_INTERFACE`	WebView with JavaScript interface (Android)	❌	Android not supported
`ANDROID_WEB_VIEW_JAVASCRIPT`	WebView with JavaScript enabled (Android)	❌	Android not supported
`ANDROID_WORLD_WRITABLE`	World writable file (Android)	❌	Android not supported
`SQL_INJECTION_ANDROID`	Potential Android SQL Injection	❌	Android not supported
`GROOVY_SHELL`	Potential code injection when using GroovyShell	❌	Groovy not supported
`JSP_INCLUDE`	Dynamic JSP inclusion	❌	JSP not supported
`JSP_JSTL_OUT`	Escaping of special XML characters is disabled	❌	JSP not supported
`JSP_SPRING_EVAL`	Dynamic variable in Spring expression	❌	JSP not supported
`JSP_XSLT`	A malicious XSLT could be provided to the JSP tag	❌	JSP not supported
`XSS_JSP_PRINT`	Potential XSS in JSP	❌	JSP not supported
`XSS_REQUEST_PARAMETER_TO_JSP_WRITER`	XSS: Servlet reflected cross site scripting vulnerability	❌	JSP not supported
`REQUESTDISPATCHER_FILE_DISCLOSURE`	RequestDispatcher File Disclosure	❌	JSP not supported

Excluded patterns (6)

We excluded the patterns below because they are overly verbose; they are triggered by existing entry-points and do not indicate any vulnerability.

Rule ID	Description	Status	Comment
`STRUTS1_ENDPOINT`	Found Struts 1 endpoint	🚫	the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them
`STRUTS2_ENDPOINT`	Found Struts 2 endpoint	🚫	the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them
`SPRING_ENDPOINT`	Found Spring endpoint	🚫	We cannot cope with annotations; in addition endpoints should probably not end up in the final security report anyway
`TAPESTRY_ENDPOINT`	Found Tapestry page	🚫	We cannot cope with annotations; in addition endpoints should probably not end up in the final security report anyway.
`JAXRS_ENDPOINT`	Found JAX-RS REST endpoint	🚫	the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them
`JAXWS_ENDPOINT`	Found JAX-WS SOAP endpoint	🚫	the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them
`HARD_CODE_KEY`	Secret detection rule	🚫	Secret Detection is taken care of by a dedicated analyzer

Patterns we were unable to migrate (12)

The patterns below could not be migrated, because they required features not supported by Semgrep. See https://gitlab.com/gitlab-org/gitlab/-/issues/357679 for more information.

Rule ID	Description	Status	Comment
`SPRING_CSRF_UNRESTRICTED_REQUEST_MAPPING`	Spring CSRF unrestricted RequestMapping	🚫	No support for parsing annotations
`SPRING_UNVALIDATED_REDIRECT`	Spring Unvalidated Redirect	🚫	No support for annotations
`WICKET_ENDPOINT`	Found Wicket WebPage	🚫	the endpoint rules only provide general information about potential security issue which seems noisy -- I think we can skip them
`UNSAFE_HASH_EQUALS`	Unsafe hash equals	🚫	this rule is highly prone to FPs -- it checks for unsecure hash functions by looking for keywords (e.g., sha) in variable or parameter names. As we are already covered by secret detection, we can probably omit this particular rule.
`STATIC_IV`	Static IV	🚫	https://gitlab.com/gitlab-org/gitlab/-/issues/357679#note_905023485
`DESERIALIZATION_GADGET`	This class could be used as deserialization gadget	🚫	Multiple logical flows involved. Cannot be achieved in Semgrep.
`ENTITY_LEAK`	Unexpected property leak	🚫	Annotations of classes are processed to determine the result. This cannot be achieved in Semgrep.
`ENTITY_MASS_ASSIGNMENT`	Mass assignment	🚫	Annotations of classes are processed to determine the result. This cannot be achieved in Semgrep.
`ESAPI_ENCRYPTOR`	Use of ESAPI Encryptor	🚫	Config files related. We currently support only files with `.java` extensions.
`JACKSON_UNSAFE_DESERIALIZATION`	Unsafe Jackson deserialization configuration	🚫	Reason
`OBJECT_DESERIALIZATION`	Object deserialization is used	🚫	This problem is solved by determining Interface supersets and Annotation metadata. This cannot be accomplished in Semgrep
`REDOS`	Regex DOS (ReDOS)	🚫	This problem is solved by applying set of conditional logic on each character of a target string. This cannot be accomplished in Semgrep

security-code-scan

Modified patterns (1)

Rule ID	Description	Comment
`SCS0018`	Path Traversal	We adapted the pattern to not cover arguments passed to `Main` as sources because this often lead to FPs for CLI apps.

Excluded patterns (1)

We excluded the patterns below because they are overly verbose.

Rule ID	Description	Status	Comment
`SCS0015`	Hardcoded Password	🚫	This is better served by Secrets Detection as there are a multitude of ways that hardcoded passwords can be specified.

Patterns we were unable to migrate (5)

The patterns below could not be migrated, because they required features not supported by Semgrep.

Rule ID	Description	Status	Comment
`SCS0021`	Request Validation Disabled (Configuration File)	🚫	XML configuration file.
`SCS0022`	Event Validation Disabled	🚫	XML configuration file.
`SCS0023`	View State Not Encrypted	🚫	XML configuration file.
`SCS0024`	View State MAC Disabled	🚫	XML configuration file.
`SCS0008`	Cookie Without SSL Flag	🚫	The SCS rule also detects vulnerabilities in ASP.NET config files which is not supported by Semgrep. We also haven't been able to detect these with SCS within the `gapanalysis` job as the `HttpCookie` class requires .NET Framework.
`SCS0009`	Cookie Without HttpOnly Flag	🚫	The SCS rule also detects vulnerabilities in ASP.NET config files which is not supported by Semgrep. We also haven't been able to detect these with SCS within the `gapanalysis` job as the `HttpCookie` class requires .NET Framework.
`SCS0002`	SQL Injection	🚫	The SCS rule also detects vulnerabilities in ASP.NET UI code, which Semgrep does not support.
`SCS0003`	XPath Injection	🚫	The SCS rule also detects vulnerabilities in ASP.NET UI code, which Semgrep does not support.
`SCS0003`	XPath Injection	🚫	The SCS rule also detects vulnerabilities in ASP.NET UI code, which Semgrep does not support.
`SCS0030`	Request validation is enabled only for pages (Configuration File)	🚫	This rule relates to changes in the Configuration File(XML) format. Semgrep does not have GA support for HTML/XML format.

Rule synchronization from Upstream scanners

Semgrep rules should be kept in-sync with upstream scanners regularly; here's the process:

Pull the newly added rules from the analyzer's Upstream source (excluding the rules which could not be translated due to Semgrep limitations - see above).
Translate newly identified rules into Semgrep-equivalent rules
Map them against native analyzer's IDs in this repository.
Generate a new ruleset distribution using the instructions described above.
Add all the un-translatable rules into this file along with the reason against the downstream analyzer/
Copy over the new ruleset distribution into Semgrep/rules to reflect rule changes in the analyzer.

For better tracking purposes, create a dedicated issue on rule synchronization cadence and create a sub-task for each semgrep-translated analyzer. The subtask should contain all the new rules that should be synchronized. Here's an example issue that has followed the mentioned process.

Name		Name	Last commit message	Last commit date
Latest commit History 627 Commits
c		c
ci		ci
csharp		csharp
dist		dist
docs		docs
go		go
java		java
javascript		javascript
mappings		mappings
python		python
scala		scala
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.yamllint		.yamllint
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semgrep rules

Updating rules

Formatting guidelines

Mappings

Data sources

Contributing

Contribution instructions

Versioning and Changelog

Credits

Rule deployment

Rules that are not covered at the moment

Bandit

Excluded patterns (1)

Adjusted patterns (3)

ESLint

Patterns we were unable to migrate (1)

Gosec

Patterns we were unable to migrate (2)

find-sec-bugs

Adjusted patterns

Out of scope patterns (25)

Excluded patterns (6)

Patterns we were unable to migrate (12)

security-code-scan

Modified patterns (1)

Excluded patterns (1)

Patterns we were unable to migrate (5)

Rule synchronization from Upstream scanners

About

Uh oh!

Releases

Packages

Contributors 9

Uh oh!

Languages

License

codacy-security/gitlab-sast-rules

Folders and files

Latest commit

History

Repository files navigation

Semgrep rules

Updating rules

Formatting guidelines

Mappings

Data sources

Contributing

Contribution instructions

Versioning and Changelog

Credits

Rule deployment

Rules that are not covered at the moment

Bandit

Excluded patterns (1)

Adjusted patterns (3)

ESLint

Patterns we were unable to migrate (1)

Gosec

Patterns we were unable to migrate (2)

find-sec-bugs

Adjusted patterns

Out of scope patterns (25)

Excluded patterns (6)

Patterns we were unable to migrate (12)

security-code-scan

Modified patterns (1)

Excluded patterns (1)

Patterns we were unable to migrate (5)

Rule synchronization from Upstream scanners

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Uh oh!

Languages

Packages