Upstream comparison tool: Output relative % difference vs upstream
Before this CL, the upstream comparison tool only produced an
*absolute* measure of the difference between a file and its
upstream version, namely the edit distance (minimum number of
lines to add/remove/replace to transform the content of one
file into the other's).
After this CL, the tool also notes the *relative* difference,
using the formula:
edit_distance / max(N, M)
where N and M are the number of lines in the two files.
- A relative distance of 0% means the files are identical.
- A relative distance of 100% can mean that one file is empty
(the other file has additional lines), or that all lines are
different, or a mixture of the two (e.g. a file being twice
as long as the other but with no lines in common).
The following more complex approaches were also implemented:
- Use the number of lines matched (from the calculation
of the edit distance) in the relative distance formula.
- Compute the edit distance in time O((N+M)*D*log(D)) rather
than O(N*M) by restricting the search to at most D
insertions/deletions and performing a one-sided binary
search for D, where D is the edit distance.
However, those approaches:
- involved more complex code
- were slower or at least not faster in practice
- yielded nearly the same output as this simpler approximation
Therefore, this CL only contains the simpler code based on the
simpler formula.
The tool takes about 18min20sec to compare all ojluni .java files
against three upstreams.
Test: oj_upstream_comparison.py --upstream_root /tmp/openjdk
Change-Id: I6fda8b3f389f45affd920107718d0e9fe2973e9f
1 file changed