experimental/diff

  Source   Edit

This module implements an algorithm to compute the diff between two sequences of lines.

Example:

import experimental/diff
assert diffInt(
  [0, 1, 2, 3, 4, 5, 6, 7, 8],
  [-1, 1, 2, 3, 4, 5, 666, 7, 42]) ==
  @[Item(startA: 0, startB: 0, deletedA: 1, insertedB: 1),
    Item(startA: 6, startB: 6, deletedA: 1, insertedB: 1),
    Item(startA: 8, startB: 8, deletedA: 1, insertedB: 1)]

Example:

import experimental/diff
# 2 samples of text (from "The Call of Cthulhu" by Lovecraft)
let txt0 = """
abc
def ghi
jkl2"""
let txt1 = """
bacx
abc
def ghi
jkl"""
assert diffText(txt0, txt1) ==
  @[Item(startA: 0, startB: 0, deletedA: 0, insertedB: 1),
    Item(startA: 2, startB: 3, deletedA: 1, insertedB: 1)]

Types

Item = object
  startA*: int               ## Start Line number in Data A.
  startB*: int               ## Start Line number in Data B.
  deletedA*: int             ## Number of changes in Data A.
  insertedB*: int            ## Number of changes in Data B.
  
An Item in the list of differences.   Source   Edit
SeqEdit = object
  kind*: SeqEditKind         ## Sequence edit operation kind
  sourcePos*: int            ## Position in the original sequence
  targetPos*: int            ## Position in the target sequence
  
Sequence edit operation.   Source   Edit
SeqEditKind = enum
  sekNone,                  ## Empty edit operation
  sekKeep,                  ## Keep original element unchanged
  sekInsert,                ## Insert new element into target sequence
  sekReplace,               ## Replace source element with the target
  sekDelete,                ## Delete element from the source sequence
  sekTranspose               ## Transpose two elements
Kind of the sequence edit operation   Source   Edit
ShiftedDiff = object
  oldShifted*: seq[tuple[kind: SeqEditKind, item: int]]
  newShifted*: seq[tuple[kind: SeqEditKind, item: int]]
Intermediate version of the diffed sequence   Source   Edit

Procs

proc diffInt(arrayA, arrayB: openArray[int]): seq[Item] {....raises: [], tags: [].}

Find the difference in 2 arrays of integers.

arrayA A-version of the numbers (usually the old one)

arrayB B-version of the numbers (usually the new one)

Returns a sequence of Items that describe the differences.

  Source   Edit
proc diffText(text1, text2: string; sideBySide: bool;
              showLineNumbers: bool = false): string {....raises: [KeyError],
    tags: [].}
Format diff of two text blocks via newline split and default formatDiffed implementation   Source   Edit
proc diffText(textA, textB: string): seq[Item] {....raises: [KeyError], tags: [].}

Find the difference in 2 text documents, comparing by textlines.

The algorithm itself is comparing 2 arrays of numbers so when comparing 2 text documents each line is converted into a (hash) number. This hash-value is computed by storing all textlines into a common hashtable so i can find duplicates in there, and generating a new number each time a new textline is inserted.

textA A-version of the text (usually the old one)

textB B-version of the text (usually the new one)

Returns a seq of Items that describe the differences.

  Source   Edit
proc formatDiffed(shifted: ShiftedDiff; oldSeq, newSeq: seq[string];
                  sideBySide: bool; showLineNumbers: bool = false): string {.
    ...raises: [], tags: [].}

Plaintext format diff edit script for printing. Provides pretty barebones implementation of the formatting - no coloring, diff formatting is not configurable.

Generated diff formatting does not contain trailing newline

  • sideBySide: stack modified lines on top of each other (unified diff) or side-by-side (split diff)
  • oldSeq, newSeq: original diffed sequences with items converted to strings. It is assumed that each item is placed on one line and diffed between each other.
  • showLineNumbers: Show original line (sequence) numbers in the generated string printout.
  Source   Edit
proc levenshteinDistance[T](str1, str2: openArray[T]): tuple[distance: int,
    operations: seq[SeqEdit]]

Compute edit distance between two item sequences, return list of edit operations necessary to transform str1 into str2

Adapted from https://phiresky.github.io/levenshtein-demo/

  Source   Edit
proc myersDiff[T](aSeq, bSeq: openArray[T]): seq[SeqEdit]
Diff overload without explicit comparator proc - use default == for two items.   Source   Edit
proc myersDiff[T](aSeq, bSeq: openArray[T]; itemCmp: proc (x, y: T): bool): seq[
    SeqEdit]

Generate series of sequence edit operations necessary to trasnform aSeq into bSeq. For item equality comparison use itemCmp

https://gist.github.com/adamnew123456/37923cf53f51d6b9af32a539cdfa7cc4

  Source   Edit
proc shiftDiffed[T](diff: seq[SeqEdit]; oldSeq, newSeq: openArray[T]): ShiftedDiff
Align diff operations against each other, for further formatting.   Source   Edit

Iterators

iterator zipToMax[T](lhs, rhs: seq[T]; fill: T = default(T)): tuple[lhs, rhs: T,
    rhsDefault, lhsDefault: bool, idx: int]
Iterate each argument to the end, filling in missing values with fill argument. This is an opposite of the std built-in zip which iterates up until min(lhs.len, rhs.len).   Source   Edit