My main research interest and the topic of my doctoral research is improving code reviews through computer support. I'm also interested in ways to improve software development with a focus on small and medium software companies with agile development processes.
In my Ph.D. thesis, I combine the results from my earlier articles into a coherent whole and augment it with further results, for example a systematization of code reading techniques, an efficient algorithm to determine an optimal code order for reviewing, a description of the architecture of the CoRT code review tool, and, perhaps most importantly, a catalog of essential requirements for cognitive support code review tools.
Data extracted from software repositories is used intensively in Software Engineering research, for example, to predict defects in source code. In our research in this area, with data from open source projects as well as an industrial partner, we noticed several shortcomings of conventional data mining approaches for classification problems: (1) Domain experts' acceptance is of critical importance, and domain experts can provide valuable input, but it is hard to use this feedback. (2) The evaluation of the model is not a simple matter of calculating AUC or accuracy. Instead, there are multiple objectives of varying importance, but their importance cannot be easily quantified. Furthermore, the performance of the model cannot be evaluated on a per-instance level in our case, because it shares aspects with the set cover problem. To overcome these problems, we take a holistic approach and develop a rule mining system that simplifies iterative feedback from domain experts and can easily incorporate the domain-specific evaluation needs. A central part of the system is a novel multi-objective anytime rule mining algorithm. The algorithm is based on the GRASP-PR meta-heuristic but extends it with ideas from several other approaches. We successfully applied the system in the industrial context. In the current article, we focus on the description of the algorithm and the concepts of the system. We provide an implementation of the system for reuse.
Change-based code review is used widely in industrial software development. Thus, research on tools that help the reviewer to achieve better review performance can have a high impact. We analyze one possibility to provide cognitive support for the reviewer: Determining the importance of change parts for review, specifically determining which parts of the code change can be left out from the review without harm. To determine the importance of change parts, we extract data from software repositories and build prediction models for review remarks based on this data. The approach is discussed in detail. To gather the input data, we propose a novel algorithm to trace review remarks to their triggers. We apply our approach in a medium-sized software company. In this company, we can avoid the review of 25% of the change parts and of 23% of the changed Java source code lines, while missing only about 1% of the review remarks. Still, we also observe severe limitations of the tried approach: Much of the savings are due to simple syntactic rules, noise in the data hampers the search for better prediction models, and some developers in the case company oppose the taken approach. Besides the main results on the mining and prediction of triggers for review remarks, we contribute experiences with a novel, multi-objective and interactive rule mining approach. The anonymized dataset from the company is made available, as are the implementations for the devised algorithms.
Change-based code review is a software quality assurance technique that is widely used in practice. Therefore, better understanding what influences performance in code reviews and finding ways to improve it can have a large impact. In this study, we examine the association of working memory capacity and cognitive load with code review performance and we test the predictions of a recent theory regarding improved code review efficiency with certain code change part orders. We perform a confirmatory experiment with 50 participants, mostly professional software developers. The participants performed code reviews on one small and two larger code changes from an open source software system to which we had seeded additional defects. We measured their efficiency and effectiveness in defect detection, their working memory capacity, and several potential confounding factors. We find that there is a moderate association between working memory capacity and the effectiveness of finding delocalized defects, influenced by other factors, whereas the association with other defect types is almost non-existing. We also confirm that the effectiveness of reviews is significantly larger for small code changes. We cannot conclude reliably whether the order of presenting the code change parts influences the efficiency of code review.
Our recent survey on the use of code review in practice provided further evidence that culture is an important factor when teams do not do code reviews at all. At the 56th CREST Open Workshop, I held a talk on this subject, and the great folks at UCL created a video of the talk.
Code review has been known to be an effective quality assurance technique for decades. In the last years, industrial code review practices were observed to converge towards “change-based/modern code review”, but with a lot of variation in the details of the processes. Recent research also proposed hypotheses on factors that influence the choice of process. However, all current research in this area is based on small and largely non-random samples of cases. Therefore, we set out to assess the current state of the practice and to test some of these hypotheses with a survey among commercial software development teams. We received responses from 240 teams. They support many of the stated hypotheses, e.g., that change-based code review is the dominating style of code review in the industry, and that teams doing change-based code review have a lower risk that review use fades away. However, other hypotheses could not be confirmed, mainly that the balance of effects a team tries to reach with code reviews acts as a mediator in determining the details of the review process. Apart from these findings, we contribute the survey data set as a foundation for future research.
Change-based code review, e.g., in the form of pull requests, is the dominant style of code review in practice. An important option to improve review’s efficiency is cognitive support for the reviewer. Nevertheless, review tools present the change parts under review sorted in alphabetical order of file path, thus leaving the effort of understanding the construction, connections, and logic of the changes on the reviewer. This leads to the question: How should a code review tool order the parts of a code change to best support the reviewer? We answer this question with a middle-range theory, which we generated inductively in a mixed methods study, based on interviews, an online survey, and existing findings from related areas. Our results indicate that an optimal order is mainly an optimal grouping of the change parts by relatedness. We present our findings as a collection of principles and formalize them as a partial order relation among review orders.
Tool support for change-based code review is gaining widespread acceptance in the industry. This indicates that the current generation of tools is well-aligned to current code review practices. Nevertheless, we believe that further improvements in code review tooling can lead to increased review efficiency and effectiveness. In this paper, we combine results from a qualitative study and results from the literature to substantiate this claim. We derive promising improvement areas and provide an overview of existing research in these areas. A common attribute of these improvements is that they trade flexibility for reviewer support. As flexibility is one of the main characteristics of the current generation of code review tools in Hedberg's classification of review tool generations, we regard these coming tools as part of a new generation of code review tools.
Code review in the industry today is different to code review twenty years ago. The process has become more lightweight, reviews are performed frequently and change-based and the use of specialized tools is increasing. An accurate view of the current state of the industrial practice is an indispensable foundation for improving it. Most recent descriptions of review practices come from a limited population of large high-tech companies. Therefore, we used interviews with software engineering professionals from a broad sample of 19 companies to gain insight into their code review practices. We augmented our findings with data for 11 companies found through a semi-systematic literature review. There are many commonalities in the code review processes of these companies, but also a lot of variation in the details. A simple process taxonomy cannot describe these variations adequately. Therefore, we present a faceted classification scheme that is grounded in our observations.
Code review is known to be an efficient quality assurance technique. Many software companies today use it, usually with a process similar to the patch review process in open source software development. However, there is still a large fraction of companies performing almost no code reviews at all. And the companies that do code reviews have a lot of variation in the details of their processes. For researchers trying to improve the use of code reviews in industry, it is important to know the reasons for these process variations. We have performed a grounded theory study to clarify process variations and their rationales. The study is based on interviews with software development professionals from 19 companies. These interviews provided insights into the reasons and influencing factors behind the adoption or non-adoption of code reviews as a whole as well as for different process variations. We have condensed these findings into seven hypotheses and a classification of the influencing factors. Our results show the importance of cultural and social issues for review adoption. They trace many process variations to differences in development context and in desired review effects.
Code review in practice is often performed change-based, i.e. using the code changes belonging to a task to determine which code to review. In previous studies, it was found that two variations of this process are used in industry: Pre commit review (review-then-commit) and post commit review (commit-then-review). The choice for one of these variants has implications not only for practitioners deciding on a code review process to use, but also for the development of review tools and for experimentation with review processes. In some situations, a specific variant is clearly preferable due to the nature of the development process or team. In other situations, there are conflicting opinions, and both variants have proponents arguing for their method of choice. So we asked: Are there practically relevant performance differences between pre commit and post commit reviews, and how are these differences influenced by contextual factors? To assess this question, we designed a parametric discrete event simulation model of certain agile development processes. We validated this model with practitioner's feedback and in part also with empirical data from industry. Our analysis of the simulation results indicates that the best choice does depend on the context, but also that there are many situations with no practically relevant difference between both choices. We identified the main influencing factors and underlying effects and condensed our findings into heuristic rules.
The need to increase the efficiency of software development creates a demand for unobtrusive in-process means of software quality assurance that align well with agile processes. Many version control system clients provide facilities for running scripts or other executables before a change is committed to the repository. These ”pre-commit hooks” can be client-side or server-side, both having distinct advantages. Client-side pre-commit hooks are well suited to implement context-sensitive checklists and automatic quality checks, but their potential has not been fully realized in practice and corresponding research is missing. To narrow this gap, we did an industrial case study on this use of client-side pre-commit hooks. The case study used the tool ”TortoiseChecklist” and was focused on increasing continuous integration build stability. The results were positive, with a notable increase in stability and no significant disadvantages noticed.