Published in the international journal of Empirical Software Engineering (EMSE)
Security vulnerability in third-party dependencies is a growing concern not only for developers of the affected software, but for the risks it poses to an entire software ecosystem, e.g., Heartbleed vulnerability. Recent studies show that developers are slow to respond to the threat of vulnerability, sometimes taking four to eleven months to act. To ensure quick adoption and propagation of a release that contains the fix (fixing release), we conduct an empirical investigation to identify lags that may occur between the vulnerable release and its fixing release (package-side fixing release). Through a preliminary study of 231 package-side fixing release of npm projects on GitHub, we observe that a fixing release is rarely released on its own, with up to 85.72% of the bundled commits being unrelated to a fix. We then compare the package-side fixing release with changes on a client-side (client-side fixing release). Through an empirical study of the adoption and propagation tendencies of 1,290 package-side fixing releases that impact throughout a network of 1,553,325 releases of npm packages, we find that stale clients require additional migration effort, even if the package-side fixing release was quick (i.e., package patch landing). Furthermore, we show the influence of factors such as the branch that the package-side fixing release lands on and the severity of vulnerability on its propagation. In addition to these lags we identify and characterize, this paper lays the groundwork for future research on how to mitigate lags in an ecosystem.
This section shows the structure of dataset for our study. Dataset is in JSON format. Each file is associated with one vulnerability.
Vulnerability reports in this study are extracted from synk.io. Each file contains one JSON object.
The following table shows the structure of vulnerability reports.
Key | Description |
---|---|
snyk_id | Vulnerability report id provided by snyk |
link | Link to vulnerability report page |
pub_date_long | Published date (ISO format) |
description | Raw html description |
lib_name | Affected library name |
severity | Level of severity |
vul_type | Type of vulnerability |
aff_ver | Affected version range (<,<=,>,>=,||,ALL) |
credit | Vulnerability Reporter |
cwe | CWE number |
disc_date | Disclosed date |
pub_date | Published date |
Dataset for PQ represents the information of fixing releases. This dataset is used in both PQ1 and PQ2. Each file contains one JSON object that represents one fixing release.
The following table shows the structure of PQ dataset.
Key | Description |
---|---|
id | Vulnerability report id |
aff_ver | Affected version range |
lib_name | Affected library name |
release_type | Type of the fix released by libraries |
num_fix_commit | Number of commits that related to the fix |
num_release_commit | Number of commits in the release |
num_fix_lines | Number of lines of code that related to the fix |
references | Links in references section |
compare_link | Link to GitHub for comparing changes between vulnerable and fixed version |
Dataset for RQ1 represents the fixing release update of packages and client-side fixing release update of clients. Each file contains one JSON array. Instances in an array are clients that directly depend on a vulnerable package.
The following table shows the structure of RQ1 dataset.
Key | Description |
---|---|
client_name | Client name |
vul_lib_ver | Vulnerable version of library adopted by client |
fix_lib_ver | Fixed version of library adopted by client |
adoption | Fix adoption of client |
Dataset for RQ2 represents lags in propagation. Each file contains one JSON array. Instances in an array are downstream clients that depend on a vulnerable package.
The following table shows the structure of RQ2 dataset.
Key | Description |
---|---|
name | Client name |
dependency | Direct dependency name of the client |
downstream_propagation | Proximity between vulnerable library and downstream client |
lags_day | Lags of updates (days) |