Delta
Delta is a structure which describe a sequence of value from both Datasets accompaninied by an error description.
Structure
case class Delta(nEqual: Long, nNotEqual: Long, describe: Both[Describe], error: Describe)
nEqual & nNotEqual
type: Long
They are two counter which tell you how many rows have exactly the same values by incrementing “nEqual” and how many have at least one difference by incrementing “nNotEqual”.
That different from “countRowEqual” and “countRowNotEqual” in Inner since difference or not is usually corresponding to one or a set of columns not every column.
describe
type: Both[Describe]
Two Describes for values in each Datasets since both are different.
error
type: Describe
A Describe for the difference between the left and the right Datasets. Generally “right_value - left_value”, for special case such as string or array, we use levenshtein to calculate the distance for the moment.
DeltaByRow
case class DeltaByRow(count: Long, byColumn: Map[String, Delta])
When there is a difference between two rows, we use a DeltaByRow instead of a DescribeByRow. It allows us to get a Describe for both Datasets since they are different but also a describe for our error.
DeltaByRow are only available inside Inner analysis and that is totally logic. In outer you can’t compare rows since they don’t have any equivalent in the other side.