emse_2023

antoniol/emse_2023

No description

R 100%

Find a file

zid ba966b5365 update		2024-01-05 09:59:03 -05:00
code	replication package	2023-12-20 13:43:05 -05:00
data	replication package	2023-12-20 13:43:05 -05:00
ManualValidation.csv	replication package	2023-12-20 13:43:05 -05:00
README.md	update	2024-01-05 09:59:03 -05:00

README.md

data/glued_time_sha_origin_all.csv : contains the working data set used to address the study research questions . The file reports an entry for each changed churn of each commit.

data/All.csv: contains the working data set used to address the study research questions in the old paper before extension. The file reports an entry for each changed churn of each commit.

The file columns can be read as follows:

ID: line id
PROJECT: project name
COMMIT: commit sha
FILE: file name
FROM: change starting line
TO: change ending line
BUGINTRO: whether the change induced a fix
FUNCCHANGE: whether the change affected a functional construct
OVERLAP: whether the fix-inducing change and the functional change occur on the same line
FUNCADD: whether the change added a new functional construct
BUGANDFUNCADD: whether the change added a new functional construct and induced a fix The remaining columns have a name in the form BugIntro.Construct.ChangeType where:
BugIntro is True or False depending on whether the given type of change induced a fix
Construct is the name of the construct being changed (lambda, listcomp, setcomp, dictcomp, map, reduce, filter)
ChangeType is either "add" (the construct has been added) or "changed" the construct has been changed Such columns count the number of changes of each kind included in the churn.

data/ToSample.csv contains the list of fixing changes affecting lines where a functional construct directly inducted a fix. This is the population from which the sample for the manual analysis has been drawn. The file contains the following columns:

PROJECT: project name
COMMIT: fix commit sha
FILE: file affected by the fix of the functional construct
LINES: list of lines affected (separated by semi-colon)
MESSAGE: commit message

code/FuncBugs.R: script used to perform the statistical analyses

code/statsmodels.R : script used to perform the statistical analyses and survival models

ManualValidation.csv: redacted manual validation spreadsheet. It contains the following columns:

PROJECT,COMMIT,FILE,LINES,MESSAGE: as in "tosample.csv"
Val1: classification by Validator 1
Val2: classification by Validator 2
Resolution: classification resulution after discussion
Classif: change classification, where appropriate
Final note, Note 1-Note 4: notes added during the validation rounds and during the final discussion

data/detailed: this directory contains raw data (results of bug fix identification, SZZ, and functional change analysis. See its README for details