Parka

Parka is a library about data quality of a Datasets in Scala.

It implements DeltaQA for Datasets, comparing two Datasets to each other and notifying differences into Parka Analysis which is an object that contains the comparison’s data..

Table of content

Installation

Stable version

A stable version isn’t available yet.

Latest version

If you want to get the very last version of this library you can still download it using bintray here : https://bintray.com/univalence/univalence-jvm/parka

Here is an example using version 0.3+79-4936e981 that work with scala 2.11.X:

resolvers += "parka" at "http://dl.bintray.com/univalence/univalence-jvm"
libraryDependencies += "io.univalence" %% "parka" % "0.3+79-4936e981"

Usage

The entry of Parka is the Parka Analysis object, this object contains a lot of information about the comparison between two Datasets which is very important for Data Quality.

To get Parka Analysis, first import parka and then generate the analysis from two Datasets as below :

import io.univalence.parka.Parka

val pa: ParkaAnalysis = Parka(df1, d2f)("key")

First give the two Datasets to compare to and then column(s) that are keys. the console or export it in JSON.

Here is an example :

import io.univalence.parka.Printer

println(Printer.printParkaResult(pa.result))

Support

If you have any problem/question don’t hesitate to add a new issue.

Authors

Made with :heart: by Univalence’s team.

License

Parka is licensed under the Apache License, Version 2.0 (the “License”); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Dependencies