Merge
merge
Merge an arbitrary number of DataFrames into a single DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*data
|
DataFrame
|
PySpark DataFrame. |
()
|
func
|
Callable
|
Reduce function to merge two DataFrames to each other. By default, this union resolves by column name. |
required |
**kwargs
|
dict
|
Keyword-arguments for merge function. |
{}
|
Returns:
Type | Description |
---|---|
DataFrame
|
Result of merging all |
Source code in src/tidy_tools/functions/merge.py
concat
Concatenate an aribitrary number of DataFrames into a single DataFrame.
By default, all objects are appended to one another by column name. An error will be raised if column names do not align.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*data
|
DataFrame
|
PySpark DataFrame. |
()
|
func
|
Callable
|
Reduce function to concatenate two DataFrames to each other. By default, this union resolves by column name. |
unionByName
|
**kwargs
|
dict
|
Keyword-arguments for merge function. |
{}
|
Returns:
Type | Description |
---|---|
DataFrame
|
Result of concatenating |
Source code in src/tidy_tools/functions/merge.py
join
join(*data: DataFrame, on: str | Column, how: str = 'inner', func: Callable = join, **kwargs: dict) -> DataFrame
Join an aribitrary number of DataFrames into a single DataFrame.
By default, all objects are appended to one another by column name. An error will be raised if column names do not align.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*data
|
DataFrame
|
PySpark DataFrame. |
()
|
on
|
str | Column
|
Column name or expression to perform join. |
required |
how
|
str
|
Set operation to perform. |
'inner'
|
func
|
Callable
|
Reduce function to join two DataFrames to each other. |
join
|
**kwargs
|
dict
|
Keyword-arguments for merge function. |
{}
|
Returns:
Type | Description |
---|---|
DataFrame
|
Result of joining |