fantastic-spork

fantastic-spork is a Scala library for Apache Spark, providing extra high-performance SQL functions for easy text and array operations. Designed to boost productivity by extending built-in Spark SQL and DataFrame capabilities.


Purpose

In real-world analytics, Spark users often need to do things like count substrings, tally words in collections, or process text—tasks not always convenient with Spark’s built-in SQL functions. fantastic-spork delivers production-ready, native Catalyst expressions for these cases, ensuring top Spark performance and seamless integration.


Setup

Artifacts are published on Maven Central.

SBT

libraryDependencies += "io.gitlab.rokorolev" % "fantastic-spork_2.12" % "0.1.2"

Maven

<dependency>
    <groupId>io.gitlab.rokorolev</groupId>
    <artifactId>fantastic-spork_2.12</artifactId>
    <version>0.1.2</version>
</dependency>

spark-submit

io.gitlab.rokorolev:fantastic-spork_2.12:0.1.2

Usage

import io.gitlab.rokorolev.fantasticspork.functions._
Available functions:

Use these functions in your Spark SQL or DataFrame code.

Example

// Spark SQL example
spark.sql("SELECT count_substring('abracadabra', 'abra')") // returns 2

// DataFrame example
df.select(expr("count_words_in_array(array('apple', 'banana', 'apple'))"))

Contributing

  1. Fork the repo on GitLab
  2. Create a branch: git checkout -b my-feature-branch
  3. Commit your changes: git commit -am 'Add new feature'
  4. Push the branch: git push origin my-feature-branch
  5. Open a Merge Request, with details

Building and Publishing

This project uses sbt for building and publishing. Artifacts go to Maven Central via Sonatype, using sbt-sonatype and sbt-pgp.

Build locally

sbt clean compile test

Publishing to Maven Central

Before publishing, set up your PGP keys and Sonatype credentials in ~/.sbt/1.0/sonatype.sbt:

credentials += Credentials(
  "Sonatype Nexus Repository Manager",
  "central.sonatype.com",
  "<your-username>",
  "<your-password>")
Release steps:
  1. Clean, test, and sign:
    sbt +clean +test +publishSigned
  2. Release the staging repo:
    sbt sonatypeBundleRelease

See sbt-sonatype docs and sbt-pgp docs for details.


Thank you for considering contributing to fantastic-spork!
Fantastic Spork Repository