cargoimp-spark-parser

cargoimp-spark-parser is a Scala library that extends your Apache Spark jobs with native high-performance parsing for IATA Cargo-IMP (International Air Transport Association Cargo Interchange Message Procedures) messages. It empowers Spark users in air cargo, logistics, and data engineering domains by making Cargo-IMP message types—such as FHL and FWB—directly accessible, explorable, and analyzable within Spark DataFrames and SQL queries.


Purpose

Cargo-IMP messages are widely used in the air cargo industry for manifest, shipment, and house bill information exchange. However, these messages are traditionally difficult to process at scale due to their legacy format and parsing complexity. cargoimp-spark-parser aims to bridge this gap by providing:

With cargoimp-spark-parser, Spark users and data engineers can:


Setup

Artifacts are available from Maven Central: cargoimp-spark-parser

SBT

libraryDependencies += "io.gitlab.rokorolev" %% "cargoimp-spark-parser" % "0.1.0"

Maven

<dependency>
    <groupId>io.gitlab.rokorolev</groupId>
    <artifactId>cargoimp-spark-parser_2.12</artifactId>
    <version>0.1.0</version>
</dependency>

spark-submit

--packages io.gitlab.rokorolev:cargoimp-spark-parser_2.12:0.1.0

Usage

import io.gitlab.rokorolev.cargoimpsparkparser.functions._

Available functions:

You can use these functions directly within Spark SQL or via the DataFrame API.

Example

spark.sql("SELECT cargo_imp_parse_fhl(AirMessageRaw) AS ParsedFhlMessage")
df.withColumn("ParsedFwbMessage", cargo_imp_parse_fwb(col("AirMessageRaw")))

Contributing

How to Contribute

  1. Fork the repository on GitLab.
  2. Create a feature branch: git checkout -b my-feature-branch
  3. Make your changes and commit them: git commit -am 'Add new feature'
  4. Push to your branch: git push origin my-feature-branch
  5. Open a Merge Request on GitLab with a clear description of your changes.

Building and Publishing

This project uses sbt for building and publishing artifacts.
Artifacts are published to Maven Central via Sonatype using the sbt-sonatype and sbt-pgp plugins.

Build Locally

sbt clean compile test

Publishing to Maven Central

Before publishing:

credentials += Credentials(
  "Sonatype Nexus Repository Manager",
  "central.sonatype.com",
  "<your-username>",
  "<your-password>")

Steps to Publish

  1. Clean, test, and sign the artifacts:
    sbt +clean +test +publishSigned
    The + runs the command for all crossScalaVersions.
  2. Release the staging repository to Maven Central:
    sbt sonatypeBundleRelease

For more details, refer to the sbt-sonatype docs and sbt-pgp docs.


Thank you for considering contributing to cargoimp-spark-parser!