Working with monorepos
Large monorepos are a reality for many organizations. Since monorepos can have anywhere from tens to even hundreds of packages scanning all packages in a monorepo can take significant periods of time. While the time requirements may vary based on your development team and pipeline times, in general, development teams need quick testing times to improve their productivity while security teams need full visibility across a monorepo. These two needs can conflict without performance engineering or an asynchronous scanning strategy. This documentation outlines some performance engineering and scanning strategies for large monorepos.
TipIf you use a monorepo with Bazel as your primary build system see Bazel documentation
Asynchronous scanning strategies
When scanning a large monorepo, a common approach taken by security teams is to run an asynchronous cron job outside of a CI/CD-based environment. This is often the point of least friction but is prohibitive. With this approach, inline blocking of critical issues is not generally possible. We would be remiss not to mention this as a scanning strategy for monorepos but this approach is NOT recommended beyond a step to get initial visibility into a large monorepo.
Performance Enhancements for inline scanning strategies
The following performance enhancements may be used with Endor Labs to enable the scanning of large monorepos:
Scoping scans based on changed files
For many CI/CD systems path filters are readily available. For example, with GitHub actions, dorny path filters is a readily accessible way to establish a set of filters by a path. This is generally the most effective path to handle monorepo deployments but does require the highest level of investment in terms of human time. The human time investment is made up for by the time saved by reducing the need to scan everything on each change.
Based on the paths that change you can scope scans based on the files that have actually changed. For example, you can scan only the packages in a monorepo that are housed under the
ui/ directory when this path has changed by running a scan such as
endorctl scan --include=ui/ when this path has been modified.
Using a path filtering approach each team working in a monorepo would need to be responsible for the packages that they maintain, but generally, each team may be associated with one to several pre-defined directory paths.
Parallelizing scans for many packages
When scanning a large monorepo organizations can choose to regularly scan the whole monorepo based on the packages or directories they’d like to scan. Different jobs may be created that scan each directory simultaneously.
Parallelizing with scoped scans
Using scoped scans for monorepos with multiple parallel include patterns is a common performance optimization for monorepos.
Below is an example parallel GitHub action scan that can be used as a reference. In this example, the directory ui/ and backend/ are both scanned simultaneously and the results are aggregated by Endor Labs.
This approach can improve the overall scan performance across a monorepo where each directory can be scanned independently.
name: Parallel Actions
- name: UI Endor Labs Scan
run: endorctl scan --include=ui/
- name: Backend Endor Labs Scan
run: endorctl scan --include=backend/
To include or exclude a package based on its directory.
endorctl scan --include="directory/path/"
See scoping scans for more information on approaches to scoping scans.
Parallelizing across languages
For teams that work out of smaller monorepos, it is often most reasonable to parallelize scanning based on the language that is being scanned and performance optimize for individual languages based on need.
name: Parallel Actions
- name: Java Endor Labs Scan
run: endorctl scan --languages=java
To scan a project for only packages used for packages written in java use the command:
endorctl scan --languages=java
Was this page helpful? Send your feedback to firstname.lastname@example.org