Skip to content

Functional PySpark Workflows

Tidy Tools is a declarative programming library promoting functional PySpark DataFrame workflows. The package is an extension of the PySpark API and can be easily integrated into existing code.



Key Features

  • Functional: Packages verbose queries into iterative, declarative expressions.
  • Feature-Rich: Extends the DataFrame API to include user-friendly features.
  • Experimental: Continuously finding new ways of improving PySpark workflows.

Philosophy

The goal of Tidy Tools is to provide an extension of the PySpark DataFrame API that promotes declarative workflows.

On top of the proposed API, Tidy Tools offers experimental solutions that cannot be easily replicated using the PySpark API. Continue reading to learn more.


Contributing

All contributions are welcome, from reporting bugs to implementing new features. Read our contributing guide to learn more.


License

This project is licensed under the terms of the MIT license.