Input Steps

Warning

Please disable AdBlock, as it might hide some elements on this page!

All steps can be imported directly from the ppc_robot_lib.steps.input module.

AdWords

Sklik

class SklikReportStep(query, output_table, allow_empty_statistics=False, transformation=None, custom_columns=None)[source]

Downloads a report from Sklik API. This step uses the PPC Robots Sklik connector to download the report, this means that it automatically performs all necessary paging and data normalization.

Only reports defined in the ppc_robot_lib.sklik.reports module can be used. See Available Reports for more details.

The query parameter should be a ppc_robot_lib.sklik.query.Query instance with one of the reports above.

Example:

>>> from ppc_robot_lib.sklik import Query, Condition, Op, During, Granularity
>>> from ppc_robot_lib.sklik.types import StatusEnum
>>> from ppc_robot_lib.steps.input import SklikReportStep
>>> query = Query(
...     select=['Name', 'Status', 'Clicks', 'Impressions'],
...     from_report='group',
...     where=[
...         Condition('Status', Op.EQ, StatusEnum.ACTIVE),
...         Condition('Impressions', Op.GT, 0),
...     ],
...     during=During.LAST_30_DAYS,
...     granularity=Granularity.DAILY,
... )
>>> SklikReportStep(query, output_table='ad_groups')
Parameters:
  • query (Query) – Query to execute.

  • output_table (str) – Name of the output table.

  • allow_empty_statistics – Should we include rows with zero impressions?

  • transformation (Callable[[Iterable], Iterable]) – Custom transformation function. The function receives an iterable (a generator, to be more precise, but this behaviour can change in the future) and must return another iterable, that can be passed to pandas.DataFrame as the data argument.

  • custom_columns (list[str]) – Custom column names for the output table. If no columns are given, names from the query are used. Useful if you use a custom transformation that can change the columns.

class SklikReportDetailsBatchStep(source_table, source_column, base_query, output_table, cond_column=None, allow_empty_statistics=False, transformation=None, custom_columns=None, batch_size=10000)[source]

Downloads additional details for a given table. This step takes a column from source table, takes the base query and adds a new IN[] condition for specified column (defaults to the source column name) to the query. Then it uses this query to fetch report data. The data is fetched in multiple batches and ten concatenated.

Example:

>>> from ppc_robot_lib.sklik import Query, During
>>> from ppc_robot_lib.steps.input import SklikReportDetailsBatchStep
>>> base_query = Query(
...     select=['Id', 'Name', 'Impressions', 'Clicks'],
...     from_report='campaign',
...     during=During.LAST_30_DAYS,
... )
>>> SklikReportDetailsBatchStep(
...     source_table='Campaigns',
...     source_column='CampaignId',
...     base_query=base_query,
...     cond_column='Id',
...     output_table='campaign_data',
...     batch_size=5000,
... )
Parameters:
  • source_table (str) – Name of the source table.

  • source_column (str) – Name of the source column - values of this column will be used for the IN[] condition.

  • base_query (Query) – Base Query used to build a query for each batch.

  • cond_column (str) – Name of the column that will be used in the condition. Defaults to source_column.

  • output_table (str) – Name of the output table.

  • allow_empty_statistics – Should we include rows with zero impressions?

  • transformation (Callable[[Iterable], Iterable]) – Custom transformation function. The function receives an iterable (a generator, to be more precise, but this behaviour can change in the future) and must return another iterable, that can be passed to pandas.DataFrame as the data argument.

  • custom_columns (list[str]) – Custom column names for the output table. If no columns are given, names from the query are used. Useful if you use a custom transformation that can change the columns.

  • batch_size (int) – Number of values in the IN[] condition for each batch.

URL Check

class CheckUrlsStep(input_table, output_table, use_group=False, column=None, parallel_tasks=4, max_conn=16, timeout=10, hide_robot_suffix=False, use_get_requests=False, force_https=False, check_autotagging_params=False)[source]

Performs an URL check.

Input can be either a pandas.DataFrame, or pandas.core.groupby.DataFrameGroupBy. If you use a DataFrame, you have to set the column argument to a column with a valid URL. Please note that this step does not perform any kind of deduplication, so you might end up checking a single URL multiple times.

If you use a group-by result as the input, index with group keys is always used as the URL. The index cannot be hierarchical, this means the group-by cannot be constructed with multiple columns for grouping.

Output table will contain 4 columns:

url

URL being checked.

code

HTTP Status code. Might be blank if the connection failed and page was not retrieved. Either error or code will be present.

redirect_to

URL to which the user would be redirected, filled-in if the server has returned a Location header.

error

Error code if the check failed. Either error or code will be present.

Example:

>>> from ppc_robot_lib.steps.input import CheckUrlsStep
>>> CheckUrlsStep("urls_grouped", output_table="check_result", use_group=True)

The urls_grouped can be prepared using the ppc_robot_lib.steps.transformations.group_by_column.GroupByColumnStep:

>>> from ppc_robot_lib.steps.transformations import GroupByColumnStep
>>> GroupByColumnStep('urls', ['Url'], 'urls_grouped')
Parameters:
  • input_table – Input table with URLs. Can be either dataframe or

  • output_table – Output table. Must not exist.

  • use_group – Set to True if the input table is a pandas.core.groupby.DataFrameGroupBy instance. Exclusive with the column parameter.

  • column (str) – Column with URL from the input table.

  • parallel_tasks (int) – Maximum number of URLs to check in parallel.

  • max_conn (int) – Maximum keep-alive connections.

  • timeout (int) – URL download timeout.

  • force_https (bool) – check https or not

  • check_autotagging_params (bool) – check redirect params in url

PPC Robot’s Database

class JoinAccountInfoStep(table, properties=list[str], client_id_column=None, context_client_id=False, suffix='_account')[source]

Fetches account information and settings and joins them to the specified table. The accounts are fetched either by value in column client_id_column, or by client ID specified in task context. Lookups are always performed by the ext_id column in the ppc_robot_lib.models.client_account.ClientAccount model.

Properties is either a list of columns to be fetched, any field from ppc_robot_lib.models.client_account.ClientAccount can be used. If the column already exits in the table, specified suffix is appended to the result name (_account by default).

Parameters:
  • table (str) – Table name to which the account information should be added.

  • properties – Properties of the ppc_robot_lib.models.client_account.ClientAccount model which should be added as columns to the table.

  • client_id_column (str) – Use the specified column values as input for the lookup.

  • context_client_id (bool) – Set to True if you would like to use client_id stored in the task context.

  • suffix – Suffix to add to the columns if they already exists in the table.