Home

Spark-Tools is a set of tools dedicated to providing more clarity for data engineers when working on Spark jobs.

Our tools were created based on the usage of meta-data, which allows for greater visibility on code structures. This in turn provides more context when something breaks down and needs fixing.

Imagine you’re stumbling through your room in the dark and walk straight onto a piece of Lego; you curse, turn on the light and see that piece of Lego. Spark-Tools is that light: it won’t stop the pain, but at least you know what hurt you.

Getting started with Spark-Test

Spark-Test is the tool that will improve your test reports.

We recommend you to start using Spark-Tools with Spark-Test, because it’s the most accessible tool. Here’s an example of how to use Spark-Test from A to Z.

Include Spark-Test in your project by implementing the following lines inside your build sbt:

resolvers += "spark-test" at "http://dl.bintray.com/univalence/univalence-jvm"

libraryDependencies += "io.univalence" %% "spark-test" % "0.2+245-09a064d9" % Test

Spark-Test provides an assertEquals function which compares two RDDs, Datasets or Dataframes. It returns an SparkTestError if they are different.

import io.univalence.sparktest.SparkTest
import org.scalatest.FunSuiteLike

class MyTestClass extends FunSuiteLike with SparkTest {
    test("some test"){
        case class A(a:Int)
        
        val df = dataframe("{a:1, b:true}", "{a:2, b:false}")
        val ds = dataset(A(1), A(3))
        
        df.assertEquals(ds)
    }
}
java.lang.AssertionError: The data set content is different :
in field a, 2 is not equals to expected value 3 
dataframe("{ a: 2 , b: false}")
dataframe("{ a: 3 }")

There are many other features! To learn more about Spark-Test, see the Spark-Test documentation.

The tools

Each tools are open source and available on Github

  • Spark-test, testing tools for Spark
  • Parka, a tool that applies deltaQA for Datasets
  • Plumbus, light misc things for Spark
  • Fenek, a DSL for semi-typed transformation in Scala for Spark
  • Spark-ZIO, Spark in ZIO environment