Skip to content

commonquail/commit-size-distribution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Commit patch size distribution

Calculate and plot cumulative distribution functions of the added, removed, and total changed lines of all non-merge commits in a Git repository.

Run commit-size-distribution.py with the path to a Git repository, specifying either the --preview flag, a location to save the plot as a PNG, or both.

Example

The sample directory contains a plot of Git's own commit size distribution, generated with:

./commit-size-distribution.py --max-size 2000 --mark-hours ../git/ sample/git.png

CDF of Git's commit patch sizes

The --mark-hours option draws vertical lines at increments of 400 to indicate how many hours the data point would take to review. In a famous study of code review of a multi-million line Java application at Cisco Systems, SmartBear found that

  • defects discovered approaches zero at 400 lines of inspection;
  • reviewing more than 400 lines of code per hour greatly reduces the number of defects found; and
  • mental fatigue sets in after about 60 minutes of review.

The optimal rate of inspection will vary with language and context but 400 is a nice round number.

The --max-size option causes any data point in the respective graph above the supplied value to be discarded. This is useful to indicate that commits above a certain size are not realistically thoroughly reviewable, such as style changes across an entire code base.

Installation

$ python3 -m venv venv
$ . venv/bin/activate
$ pip install -r requirements.txt

About

Graph the distribution of commits' diffstats

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages