Input Steps
Warning
Please disable AdBlock, as it might hide some elements on this page!
All steps can be imported directly from the ppc_robot_lib.steps.input
module.
AdWords
Sklik
- class SklikReportStep(query, output_table, allow_empty_statistics=False, transformation=None, custom_columns=None)[source]
Downloads a report from Sklik API. This step uses the PPC Robots Sklik connector to download the report, this means that it automatically performs all necessary paging and data normalization.
Only reports defined in the
ppc_robot_lib.sklik.reports
module can be used. See Available Reports for more details.The
query
parameter should be appc_robot_lib.sklik.query.Query
instance with one of the reports above.Example:
>>> from ppc_robot_lib.sklik import Query, Condition, Op, During, Granularity >>> from ppc_robot_lib.sklik.types import StatusEnum >>> from ppc_robot_lib.steps.input import SklikReportStep >>> query = Query( ... select=['Name', 'Status', 'Clicks', 'Impressions'], ... from_report='group', ... where=[ ... Condition('Status', Op.EQ, StatusEnum.ACTIVE), ... Condition('Impressions', Op.GT, 0), ... ], ... during=During.LAST_30_DAYS, ... granularity=Granularity.DAILY, ... ) >>> SklikReportStep(query, output_table='ad_groups')
- Parameters:
query (
Query
) – Query to execute.output_table (
str
) – Name of the output table.allow_empty_statistics – Should we include rows with zero impressions?
transformation (
Callable
[[Iterable
],Iterable
]) – Custom transformation function. The function receives an iterable (a generator, to be more precise, but this behaviour can change in the future) and must return another iterable, that can be passed topandas.DataFrame
as thedata
argument.custom_columns (
list
[str
]) – Custom column names for the output table. If no columns are given, names from the query are used. Useful if you use a custom transformation that can change the columns.
- class SklikReportDetailsBatchStep(source_table, source_column, base_query, output_table, cond_column=None, allow_empty_statistics=False, transformation=None, custom_columns=None, batch_size=10000)[source]
Downloads additional details for a given table. This step takes a column from source table, takes the base query and adds a new IN[] condition for specified column (defaults to the source column name) to the query. Then it uses this query to fetch report data. The data is fetched in multiple batches and ten concatenated.
Example:
>>> from ppc_robot_lib.sklik import Query, During >>> from ppc_robot_lib.steps.input import SklikReportDetailsBatchStep >>> base_query = Query( ... select=['Id', 'Name', 'Impressions', 'Clicks'], ... from_report='campaign', ... during=During.LAST_30_DAYS, ... ) >>> SklikReportDetailsBatchStep( ... source_table='Campaigns', ... source_column='CampaignId', ... base_query=base_query, ... cond_column='Id', ... output_table='campaign_data', ... batch_size=5000, ... )
- Parameters:
source_table (
str
) – Name of the source table.source_column (
str
) – Name of the source column - values of this column will be used for the IN[] condition.base_query (
Query
) – Base Query used to build a query for each batch.cond_column (
str
) – Name of the column that will be used in the condition. Defaults tosource_column
.output_table (
str
) – Name of the output table.allow_empty_statistics – Should we include rows with zero impressions?
transformation (
Callable
[[Iterable
],Iterable
]) – Custom transformation function. The function receives an iterable (a generator, to be more precise, but this behaviour can change in the future) and must return another iterable, that can be passed topandas.DataFrame
as thedata
argument.custom_columns (
list
[str
]) – Custom column names for the output table. If no columns are given, names from the query are used. Useful if you use a custom transformation that can change the columns.batch_size (
int
) – Number of values in the IN[] condition for each batch.
URL Check
- class CheckUrlsStep(input_table, output_table, use_group=False, column=None, parallel_tasks=4, max_conn=16, timeout=10, hide_robot_suffix=False, use_get_requests=False, force_https=False, check_autotagging_params=False)[source]
Performs an URL check.
Input can be either a
pandas.DataFrame
, orpandas.core.groupby.DataFrameGroupBy
. If you use a DataFrame, you have to set thecolumn
argument to a column with a valid URL. Please note that this step does not perform any kind of deduplication, so you might end up checking a single URL multiple times.If you use a group-by result as the input, index with group keys is always used as the URL. The index cannot be hierarchical, this means the group-by cannot be constructed with multiple columns for grouping.
Output table will contain 4 columns:
url
URL being checked.
code
HTTP Status code. Might be blank if the connection failed and page was not retrieved. Either
error
orcode
will be present.redirect_to
URL to which the user would be redirected, filled-in if the server has returned a
Location
header.error
Error code if the check failed. Either
error
orcode
will be present.
Example:
>>> from ppc_robot_lib.steps.input import CheckUrlsStep >>> CheckUrlsStep("urls_grouped", output_table="check_result", use_group=True)
The
urls_grouped
can be prepared using theppc_robot_lib.steps.transformations.group_by_column.GroupByColumnStep
:>>> from ppc_robot_lib.steps.transformations import GroupByColumnStep >>> GroupByColumnStep('urls', ['Url'], 'urls_grouped')
- Parameters:
input_table – Input table with URLs. Can be either dataframe or
output_table – Output table. Must not exist.
use_group – Set to
True
if the input table is apandas.core.groupby.DataFrameGroupBy
instance. Exclusive with thecolumn
parameter.column (
str
) – Column with URL from the input table.parallel_tasks (
int
) – Maximum number of URLs to check in parallel.max_conn (
int
) – Maximum keep-alive connections.timeout (
int
) – URL download timeout.force_https (
bool
) – check https or notcheck_autotagging_params (
bool
) – check redirect params in url
PPC Robot’s Database
- class JoinAccountInfoStep(table, properties=list[str], client_id_column=None, context_client_id=False, suffix='_account')[source]
Fetches account information and settings and joins them to the specified table. The accounts are fetched either by value in column
client_id_column
, or by client ID specified in task context. Lookups are always performed by theext_id
column in theppc_robot_lib.models.client_account.ClientAccount
model.Properties is either a list of columns to be fetched, any field from
ppc_robot_lib.models.client_account.ClientAccount
can be used. If the column already exits in the table, specified suffix is appended to the result name (_account
by default).- Parameters:
table (
str
) – Table name to which the account information should be added.properties – Properties of the
ppc_robot_lib.models.client_account.ClientAccount
model which should be added as columns to the table.client_id_column (
str
) – Use the specified column values as input for the lookup.context_client_id (
bool
) – Set toTrue
if you would like to use client_id stored in the task context.suffix – Suffix to add to the columns if they already exists in the table.