The meterstick package provides a concise syntax to describe and execute
routine data analysis tasks. Please see meterstick_demo.ipynb for examples.
Disclaimer
This is not an officially supported Google product.
tl;dr
Modify the demo colab notebook and adapt it to your needs.
Building up an analysis
Every analysis starts with a Metric or a MetricList. A full list of Metrics
can be found below.
A Metric may be modified by one or more Operations. For example, we might
want to calculate a confidence interval for the metric, a treatment-control
comparison, or both.
Once we have specified the analysis, we pass in the data to compute the
analysis on, as well as variables to slice by.
This calculates the percent change in conversion rate and bounce rate,
relative to the control arm, for each country and device, together with
95% confidence intervals based on jackknife standard errors.
Building Blocks of an Analysis Object
Metrics
A Meterstick analysis begins with one or more metrics.
Currently built-in metrics include:
Count(variable): calculates the number of (non-null) entries of the
variable column.
Sum(variable) : calculates the sum of the variable column.
Dot(variable1, variable2, normalize=False): calculates the dot product
between the variable1 column and the variable2 column; normalize
determines whether to normalize the dot product using the length.
Max(variable): calculates the max of the variable column.
Min(variable): calculates the min of the variable column.
Nth(variable, n, sort_by, ascending=True, dropna=True) computes the nth
value (0-based indexing) in the variable column after sorting by the
sort_by column.
Variance(variable, unbiased=True): calculates the variance of the
variable column unbiased determines whether the unbiased (sample) or
population estimate is used.
StandardDeviation(variable, unbiased=True): calculates the standard
deviations of variable; unbiased determines whether the unbiased or MLE
estimate is used.
CV(variable, unbiased=True): calculates the coefficient of variation of
the variable column; unbiased determines whether the unbiased or MLE
estimate of the standard deviation is used.
Correlation(variable1, variable2): calculates the Pearson correlation
between variable1 and variable2.
Cov(variable1, variable2): calculates the covariance between variable1
and variable2.
All metrics have an optional name argument which determines the column name
in the output. If not specified, a default name will be provided. For instance,
the metric Sum("Clicks") will have the default name sum(Clicks).
Metrics such as Mean and Quantile have an optional weight argument that
specifies a weighting column. The resulting metric is a weighted mean or
weighted quantile.
To calculate multiple metrics at once, create a MetricList of the individual
Metrics. For example, to calculate both total visits and conversion rate,
we would write:
When computing analyses involving multiple metrics, Meterstick will try to
cache redundant computations. For example, both metrics above require
calculating Sum("Visits"); Meterstick will only calculate this once.
You can also define custom metrics. See section Custom Metric below for
instructions.
Composite Metrics
Metrics are also composable. For example, you can:
Add metrics: Sum("X") + Sum("Y") or Sum("X") + 1.
Subtract metrics: Sum("X") - Sum("Y") or Sum("X") - 1.
Multiply metrics: Sum("X") * Sum("Y") or 100 * Sum("X").
Divide metrics: Sum("X") / Sum("Y") or Sum("X") / 2.
(Note that the first is equivalent to Ratio("X", "Y").)
Raise metrics to a power: Sum("X") ** 2 or 2 ** Sum("X") or
Sum("X") ** Sum("Y").
…or any combination of these: 100 * (Sum("X") / Sum("Y") - 1).
Cost per click (CPC): Ratio('Cost', 'Clicks', 'CPC')
Operations
Operations are defined on top of metrics. Operations include comparisons,
standard errors, and distributions.
Comparisons
A comparison operation calculates the change in a metric between various
conditions and a baseline. In A/B testing, the “condition” is
typically a treatment and the “baseline” a control.
AbsoluteChange(condition_column, baseline) : Computes the absolute change
(other - baseline).
MH(condition_column, baseline, stratified_by) : Computes the
Mantel-Haenszel estimator.
The metric being computed must be a Ratio or a MetricList of Ratios.
The stratified_by argument specifies the strata over which the MH
estimator is computed.
CUPED(condition_column, baseline, covariates, stratified_by) : Computes
the absolute change that has been adjusted using the
CUPED approach. See the
demo for details.
PrePostChange(condition_column, baseline, covariates, stratified_by) :
Computes the percent change that has been adjusted using the
PrePost approach. See the
demo for details.
Example Usage: ... | PercentChange("Experiment", "Control")
Note that condition_column can be a list of columns, in which case baseline
should be a tuple of baselines, one for each condition variable.
Standard Errors
A standard error operation adds the standard error of the metric
(or confidence interval) to the point estimate.
Built-in standard errors include:
Jackknife(unit, confidence) : Computes a leave-one-out jackknife estimate
of the standard error of the child Metric.
unit is a string for the variable whose unique values will be resampled.
confidence in (0,1) represents the level of the confidence interval;
optional
Bootstrap(unit, n_replicates, confidence, ci_method) : Computes a
bootstrap estimate of the standard error or percentiles.
n_replicates is the number of bootstrap replicates, default is 10000.
unit is a string for the variable whose unique values will be resampled;
if unit is not supplied the rows will be the unit.
ci_method specifies the confidence interval method. Defaults to 'std',
which computes bounds using standard error and normal approximation. If set
to 'percentile', it computes the confidence interval using empirical
percentiles from the bootstrap distribution and also returns the median.
confidence in (0,1) represents the level of the confidence interval;
optional if ci_method is 'std', but required if ci_method is
'percentile' (because standard error is not well defined for percentile
bootstrapping).
PoissonBootstrap(unit, n_replicates, confidence, ci_method) : Computes a
Poisson bootstrap estimate of the standard error or percentiles. It’s
identical to Bootstrap
except that we use Poisson(1) instead of multinomial distribution in
sampling. It’s faster than Bootstrap on large data when computing in SQL.
See the post
on The Unofficial Google Data Science Blog for a good introduction.
Example Usage: ... | Jackknife('CookieBucket', confidence=.95)
Transformations
Transformations are functions that can be applied to metrics to perform
element-wise operations on their results.
Currently supported transformations include:
ExponentialTransform(metric): Applies an exponential transformation to
the metric result.
LogTransform(metric, base='ln'): Applies a logarithmic transformation to
the metric result. The base can be ln or log10.
ExponentialPercentTransform(metric, base='ln'): Computes
100 * (base^metric - 1). If base='log10', it computes
100 * (10^metric - 1).
It’s useful for converting log-transformed
metrics back to a percent scale. For example, when
PercentChange(.., Sum(x)) is skewed, applying the transformation sequence:
computes the same percent change, while the confidence interval is
calculated in the log-transformed space, which often results in less
skewness.
Distributions
A distribution operation produces the distribution of the metric over
a variable.
Distribution(over): calculates the distribution of the metric over the
variables in over; the values are normalized so that they sum to 1. It has
an alias Normalize.
CumulativeDistribution(over, order=None, ascending=True, sort_by_values=False):
calculates the cumulative distribution of the metric over the variables in
over. Before computing the cumulative sum, we sort by the values if
sort_by_values=True else by the over column(s). If sort_by_values=False,
you can pass in a list of values as a custom order. ascending determines
the direction of the sort.
Example Usage: Sum("Queries") | Distribution("Device") calculates the
proportion of queries that come from each device.
Diversity
A diversity operation measures how diverse the child metric values are.
HHI(over): calculates the Herfindahl–Hirschman index of the metric values
over the variables in over. The metric values are first normalized over
over then the HHI is computed.
Entropy(over): calculates the entropy of the metric values
over the variables in over. The metric values are first normalized over
over then the entropy is computed.
TopK(over, k): calculates the total share of the top k contributors. The
metric values are first normalized over over then largest k values are
summed.
Nxx(over, x): calculates the minimum number of contributors to achieve x
total share. The metric values are first normalized over over then we
count the largest n contributors that make up x total share.
Models
A Meterstick Model fits a model on data computed by children Metrics.
Model(y, x, groupby).compute_on(data) is equivalent to
Computes y.compute_on(data, groupby) and x.compute_on(data, groupby).
Fits the underlying model on the results from #1.
We have built-in support for LinearRegression, Ridge, Lasso, ElasticNet
and LogisticRegression. Example Usage: LinearRegression(Sum('Y'), Sum('X'), 'country') calculates the sum of Y and X by country respectively, then fits a
linear regression between them.
Note that x, the 2nd arg, can be a Metric, a MetricList, or a list of Metrics.
Filtering
We can restrict our metrics to subsets of the data. For instance to calculate
metrics for non-spam clicks you can add a where clause to the Metric or
MetricList. This clause is a boolean expression which can be passed to pandas’
query() method.
Once we have specified the metric(s) and operation(s), it is time to
compute the analysis on some data. The final step is to pass in the data,
along with any variables we want to slice by. The analysis will be carried out
for each slice separately.
The data can be supplied in two forms:
a pandas DataFrame
a string representing a SQL table or subquery.
Example Usage: compute_on(df, ["Country", "Device"])
Example Usage:
compute_on_sql("SELECT * FROM table WHERE date = '20200101'", "Country")
Customizing the Output Format
When calculating multiple metrics, Meterstick will store each metric as a
separate column by default. However, it is sometimes more convenient to store
the data in a different shape: with one column storing the metric values and
another column storing the metric names. This makes it easier to facet by metric
in packages like ggplot2 and altair. This is known as the “melted”
representation of the data. To return the output in melted form, simply add the
argument melted=True in compute_on() or compute_on_sql().
Visualization
If the last operation applied to the metric is Jackknife or Bootstrap with
confidence, the output can be displayed in a way that highlights significant changes by calling
.display().
You can customize the display. It takes the same arguments as the underlying
visualization
library.
Argumentsreturn_pre_agg_df and return_formatted_df, are
particularly valuable for advanced use cases. They offer nearly unlimited
customization of the output.
You can visualize the Metric tree by calling
visualize_metric_tree(rendering_fn), where rendering_fn is a function that
can render a string of
DOT representation.
It can help you to sanity check complex Metrics.
SQL
You can get the SQL query for all built-in Metrics and Operations by calling
to_sql(sql_data_source, split_by) on the Metric. sql_data_source could be a
table or a subquery. For example,
directly, which will give you a output similar to compute_on(). execute is a
function that can execute SQL query. The mode can be None or
'mixed'. The former is recommended and computes things in SQL whenever
possible while the latter only computes the leaf Metrics in SQL.
The default dialect it uses is GoogleSQL. You can use set_dialect() to choose
other dialects. Currently we support
PostgreSQL
MySQL and MariaDB
SQLite
Oracle
Microsoft SQL Server
Trino SQL
For other dialects, you can manually overwrite the default string templates at
the top of sql.py file.
method which takes an PCollection
with a schema
as input. The args are similar to those of compute_on_sql except that
execute now should evaluate a PCollection.
Under the hood, we generate SQL queries and pass them to SqlTransform.
As a result,
You need to choose a Beam runner that supports SqlTransform. For example,
the InteractiveRunner
does NOT.
The config of the pipeline that carries the PCollection is set up by you.
For example, your setup decides if the pipeline will be ran in process or in
Cloud.
Custom Metric
You can write your own Metric and Operation. Below is a Metric taken from the demo colab.
The Metric fits a LOWESS model.
import statsmodels.api as sm
lowess = sm.nonparametric.lowess
class Lowess(Metric):
def __init__(self, x, y, name=None, where=None):
self.x = x
self.y = y
name = name or 'LOWESS(%s ~ %s)' % (y, x)
super(Lowess, self).__init__(name, where=where)
def compute(self, data):
lowess_fit = pd.DataFrame(
lowess(data[self.y], data[self.x]), columns=[self.x, self.y])
return lowess_fit.drop_duplicates().reset_index(drop=True)
As long as the Metric obeys some rules, it
will work with all built-in Metrics and Operations. For example, we can pass it
to Jackknife to get a confidence interval.
Meterstick Documentation
The meterstick package provides a concise syntax to describe and execute routine data analysis tasks. Please see meterstick_demo.ipynb for examples.
Disclaimer
This is not an officially supported Google product.
tl;dr
Modify the demo colab notebook and adapt it to your needs.
Building up an analysis
Every analysis starts with a
Metricor aMetricList. A full list of Metrics can be found below.A
Metricmay be modified by one or moreOperations. For example, we might want to calculate a confidence interval for the metric, a treatment-control comparison, or both.Once we have specified the analysis, we pass in the data to compute the analysis on, as well as variables to slice by.
Here is an example of a full analysis:
This calculates the percent change in conversion rate and bounce rate, relative to the control arm, for each country and device, together with 95% confidence intervals based on jackknife standard errors.
Building Blocks of an Analysis Object
Metrics
A Meterstick analysis begins with one or more metrics.
Currently built-in metrics include:
Count(variable): calculates the number of (non-null) entries of thevariablecolumn.Sum(variable): calculates the sum of thevariablecolumn.Dot(variable1, variable2, normalize=False): calculates the dot product between thevariable1column and thevariable2column;normalizedetermines whether to normalize the dot product using the length.Max(variable): calculates the max of thevariablecolumn.Min(variable): calculates the min of thevariablecolumn.Ratio(numerator, denominator): calculatesSum(numerator) / Sum(denominator).Nth(variable, n, sort_by, ascending=True, dropna=True)computes thenth value (0-based indexing) in thevariablecolumn after sorting by thesort_bycolumn.Variance(variable, unbiased=True): calculates the variance of thevariablecolumnunbiaseddetermines whether the unbiased (sample) or population estimate is used.StandardDeviation(variable, unbiased=True): calculates the standard deviations ofvariable;unbiaseddetermines whether the unbiased or MLE estimate is used.CV(variable, unbiased=True): calculates the coefficient of variation of thevariablecolumn;unbiaseddetermines whether the unbiased or MLE estimate of the standard deviation is used.Correlation(variable1, variable2): calculates the Pearson correlation betweenvariable1andvariable2.Cov(variable1, variable2): calculates the covariance betweenvariable1andvariable2.All metrics have an optional
nameargument which determines the column name in the output. If not specified, a default name will be provided. For instance, the metricSum("Clicks")will have the default namesum(Clicks).Metrics such as
MeanandQuantilehave an optionalweightargument that specifies a weighting column. The resulting metric is a weighted mean or weighted quantile.To calculate multiple metrics at once, create a
MetricListof the individualMetrics. For example, to calculate both total visits and conversion rate, we would write:When computing analyses involving multiple metrics, Meterstick will try to cache redundant computations. For example, both metrics above require calculating
Sum("Visits"); Meterstick will only calculate this once.You can also define custom metrics. See section
Custom Metricbelow for instructions.Composite Metrics
Metrics are also composable. For example, you can:
Sum("X") + Sum("Y")orSum("X") + 1.Sum("X") - Sum("Y")orSum("X") - 1.Sum("X") * Sum("Y")or100 * Sum("X").Sum("X") / Sum("Y")orSum("X") / 2. (Note that the first is equivalent toRatio("X", "Y").)Sum("X") ** 2or2 ** Sum("X")orSum("X") ** Sum("Y").100 * (Sum("X") / Sum("Y") - 1).Common metrics can be implemented as follows:
Ratio('Clicks', 'Impressions', 'CTR')Ratio('Conversions', 'Visits', 'CvR')Ratio('Bounce', 'Visits', 'BounceRate')Ratio('Cost', 'Clicks', 'CPC')Operations
Operations are defined on top of metrics. Operations include comparisons, standard errors, and distributions.
Comparisons
A comparison operation calculates the change in a metric between various conditions and a baseline. In A/B testing, the “condition” is typically a treatment and the “baseline” a control.
Built-in comparisons include:
PercentChange(condition_column, baseline): Computes the percent change (other - baseline) / baseline.AbsoluteChange(condition_column, baseline): Computes the absolute change (other - baseline).MH(condition_column, baseline, stratified_by): Computes the Mantel-Haenszel estimator. The metric being computed must be aRatioor aMetricListofRatios. Thestratified_byargument specifies the strata over which the MH estimator is computed.CUPED(condition_column, baseline, covariates, stratified_by): Computes the absolute change that has been adjusted using the CUPED approach. See the demo for details.PrePostChange(condition_column, baseline, covariates, stratified_by): Computes the percent change that has been adjusted using the PrePost approach. See the demo for details.Example Usage:
... | PercentChange("Experiment", "Control")Note that
condition_columncan be a list of columns, in which casebaselineshould be a tuple of baselines, one for each condition variable.Standard Errors
A standard error operation adds the standard error of the metric (or confidence interval) to the point estimate.
Built-in standard errors include:
Jackknife(unit, confidence): Computes a leave-one-out jackknife estimate of the standard error of the child Metric.unitis a string for the variable whose unique values will be resampled.confidencein (0,1) represents the level of the confidence interval; optionalBootstrap(unit, n_replicates, confidence, ci_method): Computes a bootstrap estimate of the standard error or percentiles.n_replicatesis the number of bootstrap replicates, default is 10000.unitis a string for the variable whose unique values will be resampled; ifunitis not supplied the rows will be the unit.ci_methodspecifies the confidence interval method. Defaults to'std', which computes bounds using standard error and normal approximation. If set to'percentile', it computes the confidence interval using empirical percentiles from the bootstrap distribution and also returns the median.confidencein (0,1) represents the level of the confidence interval; optional ifci_methodis'std', but required ifci_methodis'percentile'(because standard error is not well defined for percentile bootstrapping).PoissonBootstrap(unit, n_replicates, confidence, ci_method): Computes a Poisson bootstrap estimate of the standard error or percentiles. It’s identical toBootstrapexcept that we usePoisson(1)instead of multinomial distribution in sampling. It’s faster thanBootstrapon large data when computing in SQL. See the post on The Unofficial Google Data Science Blog for a good introduction.Example Usage:
... | Jackknife('CookieBucket', confidence=.95)Transformations
Transformations are functions that can be applied to metrics to perform element-wise operations on their results.
Currently supported transformations include:
ExponentialTransform(metric): Applies an exponential transformation to the metric result.LogTransform(metric, base='ln'): Applies a logarithmic transformation to the metric result. Thebasecan belnorlog10.ExponentialPercentTransform(metric, base='ln'): Computes100 * (base^metric - 1). Ifbase='log10', it computes100 * (10^metric - 1). It’s useful for converting log-transformed metrics back to a percent scale. For example, whenPercentChange(.., Sum(x))is skewed, applying the transformation sequence:computes the same percent change, while the confidence interval is calculated in the log-transformed space, which often results in less skewness.
Distributions
A distribution operation produces the distribution of the metric over a variable.
Distribution(over): calculates the distribution of the metric over the variables inover; the values are normalized so that they sum to 1. It has an aliasNormalize.CumulativeDistribution(over, order=None, ascending=True, sort_by_values=False): calculates the cumulative distribution of the metric over the variables inover. Before computing the cumulative sum, we sort by the values ifsort_by_values=Trueelse by theovercolumn(s). Ifsort_by_values=False, you can pass in a list of values as a customorder.ascendingdetermines the direction of the sort.Example Usage:
Sum("Queries") | Distribution("Device")calculates the proportion of queries that come from each device.Diversity
A diversity operation measures how diverse the child metric values are.
HHI(over): calculates the Herfindahl–Hirschman index of the metric values over the variables inover. The metric values are first normalized overoverthen the HHI is computed.Entropy(over): calculates the entropy of the metric values over the variables inover. The metric values are first normalized overoverthen the entropy is computed.TopK(over, k): calculates the total share of the top k contributors. The metric values are first normalized overoverthen largest k values are summed.Nxx(over, x): calculates the minimum number of contributors to achievextotal share. The metric values are first normalized overoverthen we count the largestncontributors that make upxtotal share.Models
A Meterstick Model fits a model on data computed by children Metrics.
Model(y, x, groupby).compute_on(data)is equivalent toy.compute_on(data, groupby)andx.compute_on(data, groupby).We have built-in support for
LinearRegression,Ridge,Lasso,ElasticNetandLogisticRegression. Example Usage:LinearRegression(Sum('Y'), Sum('X'), 'country')calculates the sum of Y and X by country respectively, then fits a linear regression between them.Note that
x, the 2nd arg, can be a Metric, a MetricList, or a list of Metrics.Filtering
We can restrict our metrics to subsets of the data. For instance to calculate metrics for non-spam clicks you can add a
whereclause to the Metric or MetricList. This clause is a boolean expression which can be passed to pandas’ query() method.Data and Slicing
Once we have specified the metric(s) and operation(s), it is time to compute the analysis on some data. The final step is to pass in the data, along with any variables we want to slice by. The analysis will be carried out for each slice separately.
The data can be supplied in two forms:
DataFrameExample Usage:
compute_on(df, ["Country", "Device"])Example Usage:
compute_on_sql("SELECT * FROM table WHERE date = '20200101'", "Country")Customizing the Output Format
When calculating multiple metrics, Meterstick will store each metric as a separate column by default. However, it is sometimes more convenient to store the data in a different shape: with one column storing the metric values and another column storing the metric names. This makes it easier to facet by metric in packages like
ggplot2andaltair. This is known as the “melted” representation of the data. To return the output in melted form, simply add the argumentmelted=Truein compute_on() or compute_on_sql().Visualization
If the last operation applied to the metric is Jackknife or Bootstrap with confidence, the output can be displayed in a way that highlights significant changes by calling
.display().You can customize the
display. It takes the same arguments as the underlying visualization library. Argumentsreturn_pre_agg_dfandreturn_formatted_df, are particularly valuable for advanced use cases. They offer nearly unlimited customization of the output.You can visualize the Metric tree by calling
visualize_metric_tree(rendering_fn), whererendering_fnis a function that can render a string of DOT representation. It can help you to sanity check complex Metrics.SQL
You can get the SQL query for all built-in Metrics and Operations by calling
to_sql(sql_data_source, split_by)on the Metric.sql_data_sourcecould be a table or a subquery. For example,gives
Very often what you need is the execution of the SQL query, then you can call
directly, which will give you a output similar to
compute_on().executeis a function that can execute SQL query. Themodecan beNoneor'mixed'. The former is recommended and computes things in SQL whenever possible while the latter only computes the leaf Metrics in SQL.The default dialect it uses is GoogleSQL. You can use
set_dialect()to choose other dialects. Currently we supportFor other dialects, you can manually overwrite the default string templates at the top of
sql.pyfile.Apache Beam
There is also a
method which takes an
PCollectionwith a schema as input. The args are similar to those ofcompute_on_sqlexcept thatexecutenow should evaluate aPCollection. Under the hood, we generate SQL queries and pass them toSqlTransform. As a result,SqlTransform. For example, the InteractiveRunner does NOT.PCollectionis set up by you. For example, your setup decides if the pipeline will be ran in process or in Cloud.Custom Metric
You can write your own Metric and Operation. Below is a Metric taken from the demo colab. The Metric fits a LOWESS model.
As long as the Metric obeys some rules, it will work with all built-in Metrics and Operations. For example, we can pass it to
Jackknifeto get a confidence interval.Further Reading