Functional PySpark Workflows
Tidy Tools is a declarative programming library promoting functional PySpark DataFrame workflows. The package is an extension of the PySpark API and can be easily integrated into existing code.
Quick Links
- Installation: Set up
tidy_tools
in your environment. - Getting Started: Learn how to build workflows step by step.
- API Reference: Explore all available functions and classes.
Key Features
- Functional: Packages verbose queries into iterative, declarative expressions.
- Feature-Rich: Extends the DataFrame API to include user-friendly features.
- Experimental: Continuously finding new ways of improving PySpark workflows.
Philosophy
The goal of Tidy Tools is to provide an extension of the PySpark DataFrame API that promotes declarative workflows.
On top of the proposed API, Tidy Tools offers experimental solutions that cannot be easily replicated using the PySpark API. Continue reading to learn more.
Contributing
All contributions are welcome, from reporting bugs to implementing new features. Read our contributing guide to learn more.
License
This project is licensed under the terms of the MIT license.