Parka
Parka is a library about data quality of a Datasets in Scala.
It implements DeltaQA for Datasets, comparing two Datasets to each other and notifying differences into Parka Analysis which is an object that contains the comparison’s data..
Table of content
Installation
Stable version
A stable version isn’t available yet.
Latest version
If you want to get the very last version of this library you can still download it using bintray here : https://bintray.com/univalence/univalence-jvm/parka
Here is an example using version 0.3+79-4936e981
that work with scala 2.11.X
:
resolvers += "parka" at "http://dl.bintray.com/univalence/univalence-jvm"
libraryDependencies += "io.univalence" %% "parka" % "0.3+79-4936e981"
Usage
The entry of Parka is the Parka Analysis object, this object contains a lot of information about the comparison between two Datasets which is very important for Data Quality.
To get Parka Analysis, first import parka and then generate the analysis from two Datasets as below :
import io.univalence.parka.Parka
val pa: ParkaAnalysis = Parka(df1, d2f)("key")
First give the two Datasets to compare to and then column(s) that are keys. the console or export it in JSON.
Here is an example :
import io.univalence.parka.Printer
println(Printer.printParkaResult(pa.result))
Support
If you have any problem/question don’t hesitate to add a new issue.
Authors
Made with :heart: by Univalence’s team.
License
Parka is licensed under the Apache License, Version 2.0 (the “License”); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Dependencies
- algebird - 0.13.4
- magnolia - 0.10.0
- jline - 3.12.1
- circe - 0.11.1
- spark-test - current
Links
- Univalence Web site
- Microsite
- Source code
- Video - DeltaQA introduction between 14:25 and 28:10