cargoimp-spark-parser
cargoimp-spark-parser is a Scala library that extends your Apache Spark jobs with native high-performance parsing for IATA Cargo-IMP (International Air Transport Association Cargo Interchange Message Procedures) messages. It empowers Spark users in air cargo, logistics, and data engineering domains by making Cargo-IMP message types—such as FHL and FWB—directly accessible, explorable, and analyzable within Spark DataFrames and SQL queries.
Purpose
Cargo-IMP messages are widely used in the air cargo industry for manifest, shipment, and house bill information exchange. However, these messages are traditionally difficult to process at scale due to their legacy format and parsing complexity. cargoimp-spark-parser aims to bridge this gap by providing:
- Native, production-ready Spark SQL functions for efficient, schema-aware parsing of Cargo-IMP messages.
- Seamless integration with Spark SQL and DataFrame APIs, eliminating the need for manual UDFs or error-prone text parsing.
- Composable, ready-to-use Spark Catalyst expressions for robust, scalable ETL and analytics pipelines.
With cargoimp-spark-parser, Spark users and data engineers can:
- Decode raw Cargo-IMP message strings into structured, queryable Spark data.
- Integrate message parsing as a first-class operation for streaming and batch pipelines.
- Simplify and accelerate analytics on airline and logistics messaging data.
Setup
Artifacts are available from Maven Central: cargoimp-spark-parser
SBT
libraryDependencies += "io.gitlab.rokorolev" %% "cargoimp-spark-parser" % "0.1.0"
Maven
<dependency>
<groupId>io.gitlab.rokorolev</groupId>
<artifactId>cargoimp-spark-parser_2.12</artifactId>
<version>0.1.0</version>
</dependency>
spark-submit
--packages io.gitlab.rokorolev:cargoimp-spark-parser_2.12:0.1.0
Usage
import io.gitlab.rokorolev.cargoimpsparkparser.functions._
Available functions:
cargo_imp_parse_fhl()
: Returns a struct of parsed FHL 4 message.cargo_imp_parse_fwb()
: Returns a struct of parsed FWB 16 message.
You can use these functions directly within Spark SQL or via the DataFrame API.
Example
spark.sql("SELECT cargo_imp_parse_fhl(AirMessageRaw) AS ParsedFhlMessage")
df.withColumn("ParsedFwbMessage", cargo_imp_parse_fwb(col("AirMessageRaw")))
Contributing
How to Contribute
- Fork the repository on GitLab.
- Create a feature branch:
git checkout -b my-feature-branch
- Make your changes and commit them:
git commit -am 'Add new feature'
- Push to your branch:
git push origin my-feature-branch
- Open a Merge Request on GitLab with a clear description of your changes.
Building and Publishing
This project uses sbt for building and publishing artifacts.
Artifacts are published to Maven Central via Sonatype using the
sbt-sonatype
and
sbt-pgp plugins.
Build Locally
sbt clean compile test
Publishing to Maven Central
Before publishing:
- Make sure your PGP keys are available (see sbt-pgp documentation).
- Configure your Sonatype credentials in
~/.sbt/1.0/sonatype.sbt
:
credentials += Credentials(
"Sonatype Nexus Repository Manager",
"central.sonatype.com",
"<your-username>",
"<your-password>")
Steps to Publish
-
Clean, test, and sign the artifacts:
Thesbt +clean +test +publishSigned
+
runs the command for allcrossScalaVersions
. -
Release the staging repository to Maven Central:
sbt sonatypeBundleRelease
For more details, refer to the sbt-sonatype docs and sbt-pgp docs.
Thank you for considering contributing to cargoimp-spark-parser!