Koheesio: Nike’s Python Framework for Data Pipeline Building

BY Mark Howell 1 years ago3 MINS READ
article cover

Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

An overview of data pipelines structured
Koheesio is structured to promote modularity and collaboration while enabling developers to create complex pipelines from simple and reusable components. The framework is versatile and supports multiple implementations, seamlessly working with various data processing libraries or frameworks, ensuring that any data processing task can be handled efficiently, regardless of the underlying technology or data scale.

Core Components of Koheesio

  1. Step: The fundamental unit of work in Koheesio, operating as a single operation in a data pipeline. Steps take inputs and produce outputs, creating building blocks for more complex processes.

  2. Context: This class configures the environment for a task, sharing variables across tasks and adapting task behavior based on the environment.

  3. Logger: A class for logging messages at different levels, ensuring streamlined debugging and traceability.
    Koheesio introduces strong typing, data validation, and settings management through Pydantic, promoting high type safety and structured configurations within pipeline components. This structured approach ensures predictable pipeline execution, built upon a foundation of well-tested code and a rich feature set, making Koheesio an excellent choice for developers and organizations focused on building robust and adaptable data pipelines.

What Sets Koheesio Apart?

Koheesio encapsulates extensive data engineering expertise, fostering a collaborative and innovative community. Unlike other similar libraries, Koheesio focuses explicitly on data pipelines, integrating closely with PySpark, and targeting tasks like data transformation, ETL jobs, data validation, and large-scale data processing. This focus allows it to offer a wide variety of features, including readers, writers, and transformations suitable for any type of data processing. Instead of competing with similar libraries, Koheesio aims for integration and utility across various scenarios.

Installation and Usage

You can install Koheesio using popular package management tools such as pip, Hatch, or poetry:

  • To install using `pip`, run:
    ```bash
    pip install koheesio

  • Using `Hatch`, add Koheesio to your `pyproject.toml`.

  • For `poetry`:
    ```bash
    poetry add koheesio==<desired_version>
    Koheesio also provides various additional features and integrations, requiring extra dependencies that can be added as needed:

  • Box Integration: Available through `koheesio.steps.integration.box`.

  • SFTP Integration: Available through `koheesio.steps.integration.spark.sftp`.

Contributing to Koheesio

Koheesio encourages contributions from the community to foster collaboration and innovation.

Contributing Steps:

  • Code Standards: Ensure your code passes pylint, black, and mypy checks by running `make check`. No errors or warnings should be reported.

  • Testing: Use `pytest` for testing, and ensure all tests pass by running `make test`.

  • Release Process: Frequent releases are aimed at, with admin developers creating new releases on GitHub and publishing to PyPI regularly.
    For more details, refer to the contribution guidelines. Koheesio adheres to Nike's Code of Conduct and Individual Contributor License Agreement.

Remember these 3 key ideas for your startup:

  1. Modularity and Versatility: With Koheesio's modular design, you can build complex data pipelines from reusable components, making it easier to adapt and scale your projects.

  2. Community and Collaboration: Embrace the collaborative community fostered by Koheesio to innovate and improve your data engineering tasks continuously.

  3. Robust Features and Integration: Take advantage of Koheesio's rich features and its seamless integration with various data processing libraries to enhance your data pipeline efficiency.
    Explore more about how Koheesio can transform your data handling processes and tap into the future of efficient data pipelines with ease.
    For more details, see the original source

Edworking is your one-stop solution for productivity. Don't miss out on the future of work. Try it today!

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
CoverEdit PDFs Securely & Freely: Breeze PDF In-Browser SolutionBreeze PDF is a free, offline browser-based PDF editor ensuring privacy. It offers text, image, and signature additions, form fields, merging, page deletion, and password protection without uploads.
BY Mark Howell 2 mo ago
CoverDecoding R1: The Future of AI Reasoning ModelsR1 is an affordable, open-source AI model emphasizing reasoning, enabling innovation and efficiency, while influencing AI advancements and geopolitical dynamics.
BY Mark Howell 26 January 2025
CoverSteam Brick: A Minimalist Gaming Console Redefines PortabilitySteam Brick: A modified, screenless Steam Deck for travel, focusing on portability by using external displays and inputs. A creative yet impractical DIY project with potential risks.
BY Mark Howell 26 January 2025
CoverVisual Prompt Injections: Essential Guide for StartupsThe Beginner's Guide to Visual Prompt Injections explores vulnerabilities in AI models like GPT-4V, highlighting security risks for startups and offering strategies to mitigate potential data compromises.
BY Mark Howell 13 November 2024
CoverGraph-Based AI: Pioneering Future Innovation PathwaysGraph-based AI, developed by MIT's Markus J. Buehler, bridges unrelated fields, revealing shared complexity patterns, accelerating innovation by uncovering novel ideas and designs, fostering unprecedented growth opportunities.
BY Mark Howell 13 November 2024
CoverRevolutionary Image Protection: Watermark Anything with Localized MessagesWatermark Anything enables embedding multiple localized watermarks in images, balancing imperceptibility and robustness. It uses Python, PyTorch, and CUDA, with COCO dataset, under CC-BY-NC license.
BY Mark Howell 13 November 2024
CoverJungle Music's Role in Shaping 90s Video Game SoundtracksJungle music in the 90s revolutionized video game soundtracks, enhancing fast-paced gameplay on PlayStation and Nintendo 64, and fostering a cultural revolution through its energetic beats and immersive experiences.
BY Mark Howell 13 November 2024
CoverMastering Probability-Generating Functions: A Guide for EntrepreneursProbability-generating functions (pgfs) are mathematical tools used in probability theory for data analysis, risk management, and predictive modeling, crucial for startups and SMEs in strategic decision-making.
BY Mark Howell 31 October 2024
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now