cargoimp-spark-parser

cargoimp-spark-parser is a Scala library that extends your Apache Spark jobs with native high-performance parsing for IATA Cargo-IMP (International Air Transport Association Cargo Interchange Message Procedures) messages. It empowers Spark users in air cargo, logistics, and data engineering domains by making Cargo-IMP message types—such as FHL and FWB—directly accessible, explorable, and analyzable within Spark DataFrames and SQL queries.

Purpose

Cargo-IMP messages are widely used in the air cargo industry for manifest, shipment, and house bill information exchange. However, these messages are traditionally difficult to process at scale due to their legacy format and parsing complexity. cargoimp-spark-parser aims to bridge this gap by providing:

Native, production-ready Spark SQL functions for efficient, schema-aware parsing of Cargo-IMP messages.
Seamless integration with Spark SQL and DataFrame APIs, eliminating the need for manual UDFs or error-prone text parsing.
Composable, ready-to-use Spark Catalyst expressions for robust, scalable ETL and analytics pipelines.

With cargoimp-spark-parser, Spark users and data engineers can:

Decode raw Cargo-IMP message strings into structured, queryable Spark data.
Integrate message parsing as a first-class operation for streaming and batch pipelines.
Simplify and accelerate analytics on airline and logistics messaging data.

Setup

Artifacts are available from Maven Central: cargoimp-spark-parser

SBT

libraryDependencies += "io.gitlab.rokorolev" %% "cargoimp-spark-parser" % "0.1.0"

Maven

<dependency>
    <groupId>io.gitlab.rokorolev</groupId>
    <artifactId>cargoimp-spark-parser_2.12</artifactId>
    <version>0.1.0</version>
</dependency>

spark-submit

--packages io.gitlab.rokorolev:cargoimp-spark-parser_2.12:0.1.0

Usage

import io.gitlab.rokorolev.cargoimpsparkparser.functions._

Available functions:

cargo_imp_parse_fhl(): Returns a struct of parsed FHL 4 message.
cargo_imp_parse_fwb(): Returns a struct of parsed FWB 16 message.

You can use these functions directly within Spark SQL or via the DataFrame API.

Example

spark.sql("SELECT cargo_imp_parse_fhl(AirMessageRaw) AS ParsedFhlMessage")
df.withColumn("ParsedFwbMessage", cargo_imp_parse_fwb(col("AirMessageRaw")))

Contributing

How to Contribute

Fork the repository on GitLab.
Create a feature branch: git checkout -b my-feature-branch
Make your changes and commit them: git commit -am 'Add new feature'
Push to your branch: git push origin my-feature-branch
Open a Merge Request on GitLab with a clear description of your changes.

Building and Publishing

This project uses sbt for building and publishing artifacts.
Artifacts are published to Maven Central via Sonatype using the sbt-sonatype and sbt-pgp plugins.

Build Locally

sbt clean compile test

Publishing to Maven Central

Before publishing:

Make sure your PGP keys are available (see sbt-pgp documentation).
Configure your Sonatype credentials in ~/.sbt/1.0/sonatype.sbt:

credentials += Credentials(
  "Sonatype Nexus Repository Manager",
  "central.sonatype.com",
  "<your-username>",
  "<your-password>")

Steps to Publish

Clean, test, and sign the artifacts:
```
sbt +clean +test +publishSigned
```
The + runs the command for all crossScalaVersions.
Release the staging repository to Maven Central:
```
sbt sonatypeBundleRelease
```

For more details, refer to the sbt-sonatype docs and sbt-pgp docs.

Thank you for considering contributing to cargoimp-spark-parser!