Tooling & Automation / 2026-06-01 / ~12 min · 2,371 words

Before you write a rule, read the code

Most SAST implementations fail not because the tool is wrong, but because the rules weren't written for the codebase they're running against. Generic rulesets generate noise. Noise gets ignored. Ignored findings don't get fixed. Here's how to build rules that match how your organization actually writes code.

Before you write a rule, read the code

The problem with generic rules

Opengrep is a community-maintained fork of Semgrep's Community Edition — created in 2025 after Semgrep Inc. relicensed parts of the CE engine and removed taint analysis, inter-procedural scanning, and a handful of other capabilities from the free tier. Opengrep restores all of it, stays backward compatible with Semgrep's rule format, and ships under LGPL-2.1. If you were running Semgrep CE in your pipelines before the relicensing, your existing rules work. Your CI configuration works. The migration is a binary swap.

None of that matters if your rules are bad. And most rules in production are bad — not syntactically, not semantically, but contextually. They were written for a generic Python web application or a generic Java service. Your codebase is neither of those things. It has specific frameworks, specific data flow conventions, specific sanitization patterns that the generic ruleset doesn't know about. So it fires on things that aren't vulnerabilities, and misses things that are.

Coverage ≠ progress. A scan generating 400 findings per week, of which engineering trusts maybe 40, is not a security program. It's a ticket queue that erodes trust on a long enough timeline to destroy the function entirely.
§ 01
Before you write a single rule

The first mistake people make is opening a text editor. Before you write anything, you need to understand three things about the codebase you're about to instrument: how does user-controlled data enter the application, what are the organization's established sanitization patterns, and where does the code actually do dangerous things with data.

These aren't questions you answer by reading documentation. You answer them by reading code. Spend two hours with the actual codebase before touching a rule file. Look at request handling. Look at how models validate input. Look at what the ORM layer looks like versus where raw SQL might appear. Look at how the application handles file uploads, external API calls, and serialization.

For Mozilla's Kuma — the codebase behind MDN Web Docs, used here as a reference — a few things become apparent quickly. The application uses Django's ORM almost exclusively for database access. Raw cursor.execute() calls are rare and mostly in migrations. The interesting attack surface is in the API layer, in custom management commands that process external content, and in places where the application calls out to external services.

why kuma
Real production Django app. Publicly available. Large enough to be representative. Not a toy — Mozilla ships it.
the audit-first rule
Two hours of code reading before touching a rule file will save you two weeks of false positive triage. This is not optional.
attack surface
Generic Django rulesets miss content ingestion paths entirely. Those are your interesting taint sources.
§ 02
Reading the codebase for taint sources

Taint analysis tracks data flow from a source (where untrusted data enters) through the application to a sink (where it could cause harm). The quality of your taint rules depends entirely on how accurately you've identified both ends.

In a Django application, the canonical sources are request parameters. But in your specific application they might be more nuanced. Look for custom middleware that transforms request data before it hits views, management commands that read from external files or APIs, serializer validated_data fields where sanitization quality varies, and model fields populated from external sources that bypass the request cycle entirely.

In Kuma, there's a meaningful amount of content ingestion from external sources — the application processes wiki content, handles document revisions, and interacts with external tooling. These ingestion paths are taint sources that a generic Django ruleset won't capture. A rule that only looks at request.GET and request.POST is going to miss an entire class of injection surface.

Walk the codebase with this question in mind: where does data come from that I didn't write? That's your source inventory. Write it down before you write a single rule.

source inventory

The list of taint sources in your application is more valuable than any rule you'll write. Build it first. Revisit it whenever the application grows a new integration.

§ 03
Writing the first rule

Start with something concrete and high-signal. Raw SQL execution in a Django application that uses the ORM primarily is rare enough to be worth flagging every time. The rule below handles the baseline case — but notice the sanitizers list. A generic SQL injection rule often has none, which means it fires on cursor.execute("SELECT * FROM t WHERE id = %s", [int(request.GET.get("id"))]) — a pattern that is not vulnerable because the cast to int eliminates the injection surface. Including int(...) as a sanitizer cuts that class of false positives immediately.

key insight
Sanitizers are where generic rules fail. The list of what your codebase uses to sanitize is more valuable than the sink patterns.
yaml rules/django/sql-injection-raw.yaml
rules:
  - id: django-raw-sql-injection
    mode: taint
    message: "User-controlled data flows into raw SQL. Parameterize or use the ORM."
    severity: ERROR
    languages: [python]
    metadata:
      cwe: "CWE-89: SQL Injection"
      confidence: HIGH
    # ANY of these is an untrusted source (OR), so list them as separate
    # items (list-level OR) or via pattern-either. Separate items shown here.
    pattern-sources:
      - pattern: request.GET.get(...)
      - pattern: request.POST.get(...)
      - pattern: request.GET[...]
      - pattern: request.POST[...]
      - pattern: request.data.get(...)
      - pattern: request.data[...]
    # Sink = the query argument of EITHER raw-exec call. OR the call shapes,
    # then focus the taint check on the $QUERY metavariable.
    pattern-sinks:
      - patterns:
          - pattern-either:
              - pattern: cursor.execute($QUERY, ...)
              - pattern: connection.execute($QUERY, ...)
          - focus-metavariable: $QUERY
    pattern-sanitizers:
      - pattern: int(...)
      - pattern: str(int(...))

Test this against actual findings before moving on. Run it against the codebase, look at everything it fires on, and ask whether each finding represents a real risk. If it's firing on things that aren't vulnerabilities, the sanitizers list needs work — not the sink patterns. That's almost always where the tuning happens.

§ 04
The cross-function problem

The rule above only catches taint flow within a single function. In real applications, data flows through multiple function calls before reaching a sink. A view extracts a request parameter, passes it to a service function, which passes it to a repository function, which calls the database. Standard taint analysis without cross-function tracking misses this entirely.

This is one of the capabilities Semgrep CE removed. Opengrep restores it via the --taint-intrafile flag, which builds function signatures and uses topological ordering to propagate taint across function boundaries within a file.

Cross-function analysis changes what you find. The false positive rate also changes — not because the analysis is less accurate, but because it surfaces findings that require more context to evaluate. A taint path that crosses three function boundaries might be safe because one of those functions always sanitizes its input, or it might be dangerous because that sanitization is conditional. You have to read the finding more carefully.

For Kuma, enabling cross-function analysis surfaces data flow paths through the content ingestion layer that the single-function rule misses entirely. Management commands that process external content, pass it through transformation functions, and eventually write it to the database show up as findings. Some of those are real. That's the point.

performance tradeoff
Cross-function analysis is significantly slower. For large monorepos this matters. The right approach is typically to run standard analysis on every commit and cross-function analysis on pull requests targeting main — fast feedback in feature branches, thorough analysis before merge.
§ 05
Tuning against real findings
reference
False positive taxonomy
missing sanitizer add pattern-sanitizer entry
framework mitigates add pattern-not exclusion
test code flagged paths.exclude in metadata
intentional unsafe nosemgrep + documentation
overly broad metavar tighten metavariable constraint
wrong language version version constraint in metadata

The first run of a new ruleset against a real codebase is going to produce noise. That's expected. What you do with that noise determines whether the ruleset becomes useful or gets ignored.

For each false positive, ask two questions: why did this fire, and is there a pattern here? A single false positive might be a quirk of the codebase. Three false positives with the same shape are a pattern, and that pattern belongs in the sanitizers list or in a rule exception.

The most common false positive patterns in Django codebases:

Type-cast sanitizationint(), float(), uuid.UUID() on request parameters. These eliminate injection surface and belong in your sanitizers list.

Framework-level sanitization — Django's escape(), mark_safe() used correctly, form validation through cleaned_data. If your codebase uses these consistently in certain layers, model that in your rules rather than flagging every downstream usage.

ORM query constructionQ() objects, filter() with keyword arguments, annotate() calls. These are parameterized by the framework. Generic rules sometimes flag them anyway. Add explicit exceptions.

Test fixtures and migration scripts — You probably don't want SAST firing on test data setup or migration scripts. Exclude tests/ and migrations/ directories in your CI configuration, not in the rule itself. Keep rule logic focused on the application code.

Document every tuning decision. Why you added a sanitizer, why you added an exception, what the false positive pattern was. Six months from now, when someone asks why the rule doesn't fire on a particular pattern, you want an answer.

triage discipline

When engineering starts suppressing findings without reading them, the ruleset is already dead. The goal isn't zero false positives. It's a signal-to-noise ratio high enough that engineers trust what comes through. That threshold is lower than most AppSec teams assume.

§ 06
The deployment pipeline

A ruleset that runs locally but not in CI is a suggestion. The integration into the pipeline is what makes it a control.

The basic CI integration for opengrep in GitHub Actions:

yaml # .github/workflows/sast.yml
# .github/workflows/sast.yml
#
# Fix vs. the published version:
#   The post scans the absolute container path `/src`, so SARIF records
#   absolute URIs (e.g. /src/app/views.py). GitHub code scanning maps findings
#   by REPO-RELATIVE path, so absolute /src/... paths don't line up with your
#   files and findings won't annotate correctly in the Security tab.
#
#   Fix: set the container working dir (`-w /src`) and scan `.`, so SARIF URIs
#   come out relative (app/views.py). Verified with semgrep 1.165.0:
#     scan /abs/path  -> uri "/home/.../app/views.py"   (won't map)
#     scan .  (from workdir) -> uri "app/views.py"       (maps)
#
# Not executed here (verify on your side): the opengrep docker image
# coordinates/tag, and the --taint-intrafile flag. The flag is documented by
# opengrep; I confirmed the shared flags, --exclude, and SARIF output by
# running semgrep, but did not run the opengrep image itself.
 
name: SAST
 
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]
 
jobs:
  opengrep:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Run opengrep
        run: |
          docker run --rm \
            -v "${GITHUB_WORKSPACE}:/src" \
            -w /src \
            ghcr.io/opengrep/opengrep:latest \
            scan \
            --config rules/ \
            --exclude tests/ \
            --exclude migrations/ \
            --sarif \
            --output results.sarif \
            .
 
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results.sarif
 
  # Deep cross-function (intrafile) taint analysis. Slower, so PR-to-main only.
  opengrep-deep:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request' && github.base_ref == 'main'
    steps:
      - uses: actions/checkout@v4
 
      - name: Run opengrep (cross-function)
        run: |
          docker run --rm \
            -v "${GITHUB_WORKSPACE}:/src" \
            -w /src \
            ghcr.io/opengrep/opengrep:latest \
            scan \
            --config rules/ \
            --taint-intrafile \
            --exclude tests/ \
            --exclude migrations/ \
            --sarif \
            --output results-deep.sarif \
            .
 
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results-deep.sarif

A few decisions embedded in this configuration worth calling out explicitly:

SARIF output — uploading findings as SARIF integrates them into GitHub's security tab. Engineers see findings in the same interface they're already using for code review. You don't need a separate dashboard until the volume requires it.

Exclude paths in CI, not in rulestests/ and migrations/ are excluded at the CI level. This keeps the rules themselves clean and means you can re-run rules against those paths when you specifically want to.

Pull request triggers — running on both push and PR means engineers see findings before merge, not after. This is the feedback loop that matters. A finding caught in PR review is a conversation. A finding caught after merge is a ticket.

For the cross-function scan, run it as a separate job triggered only on pull requests targeting main:

yaml
opengrep-deep:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request' && github.base_ref == 'main'
    steps:
      - uses: actions/checkout@v4

      - name: Run opengrep (cross-function)
        run: |
          docker run --rm \
            -v "${GITHUB_WORKSPACE}:/src" \
            ghcr.io/opengrep/opengrep:latest \
            scan \
            --config /src/rules/ \
            --taint-intrafile \
            --exclude tests/ \
            --exclude migrations/ \
            --sarif \
            --output /src/results-deep.sarif \
            /src

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results-deep.sarif

Two jobs, two feedback loops. Engineers get fast results on every commit. They get thorough results when it matters.

blocking vs. informing

Don't block PRs on SAST findings when you're rolling out a new ruleset. Surface findings as comments and in the security tab first. Block only after you've established that the false positive rate is low enough that engineers trust the signal. Blocking on noise is how you get rules that everyone suppresses.

§ 07
The maintenance problem

Rulesets go stale. The codebase adds a new framework integration. The team adopts a new sanitization pattern. A third-party library changes its API. Rules that were accurate six months ago start generating noise — or worse, start missing real findings because the patterns they were written against no longer exist.

Build maintenance into the process, not as a separate initiative. Three practices that keep rulesets from rotting:

Review suppression comments quarterly. Every # nosemgrep or opengrep equivalent in the codebase is a data point. If suppressions are clustered around a specific rule, that rule needs rewriting. If they've been in place for more than six months without review, the pattern they're suppressing has either been fixed or normalized into the codebase — and you need to know which.

Tie rule updates to architecture changes. When engineering adopts a new ORM, adds a new API framework, or changes how requests are validated, that's a triggering event for ruleset review. Put it in the engineering team's RFC or ADR process as a checklist item. AppSec reviews the ruleset impact before the change ships, not six weeks after.

Track signal-to-noise over time. If engineering is marking findings as false positives at a rate above roughly 20%, the ruleset needs attention. Below that threshold, the noise is tolerable. Above it, trust starts to erode. Most teams don't track this and only find out the ruleset is broken when engineers stop triaging findings entirely.

callout — the maintenance contract When you deploy a ruleset, you're making a commitment to the engineers who will triage its output. That commitment is: the findings we surface will be worth your time. Rulesets that generate noise without maintenance break that commitment, and broken commitments are why AppSec gets excluded from engineering conversations.

The ruleset you shipped on day one is not the ruleset you should be running in six months. Plan for that

the maintenance contract
When you deploy a ruleset, you're making a commitment to the engineers who will triage its output. That commitment is: the findings we surface will be worth your time. Rulesets that generate noise without maintenance break that commitment, and broken commitments are why AppSec gets excluded from engineering conversations.
§ 08
What you end up with

The output of this process, if you follow it, is a ruleset that engineering will actually use. Not because you mandated it. Because when a finding comes through, they look at it, and it's usually real.

That's the bar. Not zero false positives. Not comprehensive coverage of every possible vulnerability class. A signal-to-noise ratio high enough that findings get read instead of dismissed.

Start narrow. One rule, well-tuned, deployed and trusted, is more valuable than fifty rules generating noise that no one looks at. Add rules when you have the bandwidth to tune them properly. Expand coverage as trust expands.

The alternative (deploying the default ruleset, watching the ticket count climb, and waiting for engineering to start suppressing findings wholesale) is where most SAST implementations end up. You've seen it. The scanner is running. No one is looking at it. That's not a program.