Transformation Functions: Doing Something With the Data
These functions can be imported from the ppc_robot_lib.reporting.transformations
module.
Computations
- compute(table, columns)[source]
Computes one or more columns and add them to the table.
Columns are given as a dictionary, where key is a name of a new column and value is an expression. Expression can be either callable, or string with an expression for
pandas.eval()
. Syntax is briefly described at Expression evaluation via eval(). Please note that calllable is strongly preferred.The callable receives a
pandas.DataFrame
as the first positional argument. If the callable also contains thetask_ctx
argument, it will receive anppc_robot_lib.tasks.task_context.TaskContextInterface
as a keyword argument.The callable should return a value that is directly assignable to the column:
pandas.Series
or a single scalar value.Example:
>>> from ppc_robot_lib.reporting.transformation import compute >>> compute(data_frame, { ... 'Ctr1': lambda df: df['Clicks'] / df['Impressions'], ... 'Ctr2': 'Clicks / Impressions', ... })
Filtering and Splitting
- filter_table(table, filter_cb)[source]
Filters rows in the given table.
The filter can be either a callable, or list of callables. The callable receives a
pandas.DataFrame
as a single argument and must return a boolean vector that is used for selection.It you use the list of callables, their result vectors are “ANDed” together using the
&
operator. This means that the resulting rows must match all the conditions.Example:
>>> from ppc_robot_lib.reporting.transformation import filter_table >>> filter_table(urls_df, [ ... lambda df: df['http_status_code'] >= 200, ... lambda df: df['http_status_code'] <= 299, ... ])
- filter_column_is_in(input_table, input_column, ref_values, negative_match=False)[source]
Filters rows in the table, selects only rows with values in the given column that matches values in another column from the reference table. Imagine it like filtering using INNER JOIN without joining.
You can also pass a
pandas.Series
or any iterable as ref_table. In this case, you have toExample:
Let’s assume we have the two tables,
in_table
andout_table
:in_table Campaign
Tag
Campaign A
product
Campaign B
remarketing
Campaign C
dsa
Campaign D
product
Campaign E
banner
ref_table Id
Ref Tag
1
product
2
dsa
3
responsive
By executing the code:
>>> from ppc_robot_lib.reporting.transformation import filter_column_is_in >>> filter_column_is_in(in_table, "Tag", ref_values=ref_table["Ref Tag"])
We would get the following output:
out_table Campaign
Tag
Campaign A
product
Campaign C
dsa
Campaign D
product
If we set
negative_match=True
:>>> from ppc_robot_lib.reporting.transformation import filter_column_is_in >>> filter_column_is_in(in_table, "Tag", ref_values=ref_table["Ref Tag"], negative_match=True)
The resulting table would look like this:
out_table Campaign
Tag
Campaign B
remarketing
Campaign E
banner
- Parameters:
input_table (
DataFrame
) – Table that should be filtered.input_column (
str
) – Column in the input table which should be used for lookups in the reference table.ref_values (
Series
|Iterable
[Any
]) – Reference values: can be an column from another table or any iterable.negative_match (
bool
) – Set toTrue
if you need rows that DO NOT match.
- Return type:
- split_by_column_value(table, column, output_tables, default_table=None)[source]
Enables you to split one table to multiple tables according to value of a specified column.
output_tables
is a dictionary, key is the name of resulting table and value can be either:Value of the column to match.
Callable recieving a single value and returing
bool
. If the callable returnsTrue
, the row will be inserted into the given table.List of values or callables with the same semantics as in the previous step. The row will be inserted if at least one of the values or callables match.
Please note that row order might not be preserved, as this operation internally uses group by and then operates on disjunct segments of table.
Resulting tables are returned in a dictionary.
Example:
Let’s assume we have the following table:
in_table A
X
A1
1
A3
3
A2
2
A4
4
A7
7
A6
6
A5
5
A9
9
A0
10
A8
8
AA
11
By executing the function:
>>> from ppc_robot_lib.reporting.transformation import split_by_column_value >>> split_by_column_value(in_table, 'X', { ... 'o1': [1, 2, 3, lambda x: 8 <= x <= 10], ... 'o2': lambda x: 4 <= x <= 7 ... }, default_table='default') { 'o1': pandas.DataFrame(...) 'o2': pandas.DataFrame(...) 'default': pandas.DataFrame(...) }
We would get the following tables:
o1 A
X
A1
1
A3
3
A2
2
A9
9
A0
10
A8
8
o2 A
X
A4
4
A7
7
A6
6
A5
5
default A
X
AA
11
- Parameters:
table (
DataFrame
) – Input table.column (
str
) – Column to use for split.output_tables (
dict
[str
,Any
|list
[Any
] |Callable
[[Any
],bool
]]) – Dictionary of output tables. Key is name of the result table, value is either value to match, callable or list of values/callables to filter by.default_table (
str
) – Table to be used when the value does not match any of the tables specified in theoutput_tables
argument. If none is given, these rows are dropped.
- Return type:
- Returns:
Dictionary of the resulting tables.
Joining DataFrames
- join(left_table, right_table, on=None, left_on=None, right_on=None, join_type=JoinType.INNER, do_sort=True, ambiguous_column_suffixes=('', '_right'), left_index=False, right_index=False)[source]
Performs an SQL-like join on two tables.
This step supports 4 types of joins:
You can join either by columns with same name from both tables, or you can provide list of columns in both left and right table. The lists has to be of equal length.
When performing the join, values of columns on corresponding indexes must be equal in order to match the rows.
For details on joins in Pandas, see merge().
Example:
>>> from ppc_robot_lib.steps.transformations import JoinOnColumnStep, JoinType >>> JoinOnColumnStep("keywords", "adgroups", ... left_on=["CampaignName", "AdGroupName"], ... right_on=['CampaignName', "Name"], ... join_type=JoinType.INNER, ... output_table="keywords_with_adgroup_data")
- Parameters:
left_table (
DataFrame
) – Left table.right_table (
DataFrame
) – Right table.on (
str
|list
[str
]) – Column names that must match in both tables. Exclusive withleft_on
andright_on
.left_on (
str
|list
[str
]) – List of columns to match in the left table. Must be the same length withright_on
. Exclusive withon
.right_on (
str
|list
[str
]) – List of columns to match in the left table. Must be the same length withright_on
. Exclusive withon
.join_type (
JoinType
) – Type of the join. See merge() for explanation of individual types.do_sort (
bool
) – Set toTrue
if you would like to sort the tables by join columns.ambiguous_column_suffixes – Pair of column suffixes that is used when tables with conflicting column names are used. The first element is used for the left table, the second one for the right table.
left_index – Use index from the left table when matching the values.
right_index – Use index from the right table when matching the values.
- Return type:
- Returns:
Resulting table.
- concat(*tables)[source]
Concatenates rows of multiple tables into one. Internally uses
pandas.concat()
over the rows axes, this means that the same semantics applies. All indexes are ignored.Example:
>>> from ppc_robot_lib.reporting.transformation import concat >>> concat(table_a, table_b, table_c)
Sorting
- sort_values(table, order, inplace=True)[source]
Sorts the table by one or more columns.
Example:
>>> from ppc_robot_lib.steps.transformations import SortStep >>> from ppc_robot_lib.utils.types import Order, OrderBy >>> SortStep("in_table", order=[ ... OrderBy("Impressions", Order.DESC), ... OrderBy("CampaignName", Order.ASC), ... ])
- Parameters:
- Return type:
- Returns:
Sorted table. If
inplace=True
, the same instance is returned.
Renaming Columns
- suffix_columns(table, suffix, except_cols=None, inplace=True)[source]
Suffixes column names with the given
suffix
. If a column name is listed inexcept_cols
, its name is kept as-is.If the
inplace
parameter is set toTrue
(the default), it isExample:
>>> from ppc_robot_lib.reporting.transformation import suffix_columns >>> suffix_columns(performance, '_yday', except_cols=['Id', 'CampaignName'])
Grouping and Aggregations
Grouping and aggregations are very specific operations, so there are no universal steps for these operations. Use grouping operations from Pandas: Group By: split-apply-combine