The bigrquery package makes it easy to work with data stored in Google
BigQuery by allowing you to
query BigQuery tables and retrieve metadata about your projects,
datasets, tables, and jobs. The bigrquery package provides three levels
of abstraction on top of BigQuery:
The low-level API provides thin wrappers over the underlying REST API.
All the low-level functions start with bq_, and mostly have the form
bq_noun_verb(). This level of abstraction is most appropriate if
you’re familiar with the REST API and you want do something not
supported in the higher-level APIs.
The DBI interface wraps the low-level API and
makes working with BigQuery like working with any other database
system. This is most convenient layer if you want to execute SQL
queries in BigQuery or upload smaller amounts (i.e. <100 MB) of data.
The dplyr interface lets you treat
BigQuery tables as if they are in-memory data frames. This is the most
convenient layer if you don’t want to write SQL, but instead want
dbplyr to write it for you.
Installation
The current bigrquery release can be installed from CRAN:
install.packages("bigrquery")
The newest development release can be installed from GitHub:
library(dplyr)
natality <- tbl(con, "natality")
natality %>%
select(year, month, day, weight_pounds) %>%
head(10) %>%
collect()
#> # A tibble: 10 × 4
#> year month day weight_pounds
#> <int> <int> <int> <dbl>
#> 1 2005 11 NA 8.88
#> 2 2005 1 NA 8.69
#> 3 2005 3 NA 7.08
#> 4 2005 7 NA 7.81
#> 5 2005 1 NA 8.56
#> 6 2005 1 NA 8.13
#> 7 2005 7 NA 8.50
#> 8 2005 9 NA 7.56
#> 9 2005 9 NA 8.14
#> 10 2005 4 NA 7.05
Important details
BigQuery account
To use bigrquery, you’ll need a BigQuery project. Fortunately, if you
just want to play around with the BigQuery API, it’s easy to start with
Google’s free public
data and the BigQuery
sandbox. This gives you
some fun data to play with along with enough free compute (1 TB of
queries & 10 GB of storage per month) to learn the ropes.
To get started, open https://console.cloud.google.com/bigquery and
create a project. Make a note of the “Project ID” as you’ll use this as
the billing project whenever you work with free sample data; and as
the project when you work with your own data.
Authentication and authorization
When using bigrquery interactively, you’ll be prompted to authorize
bigrquery in the
browser. You’ll be asked if you want to cache tokens for reuse in future
sessions. For non-interactive usage, it is preferred to use a service
account token, if possible. More places to learn about auth:
bigrquery obtains a token with gargle::token_fetch(), which
supports a variety of token flows. This article provides full
details, such as how to take advantage of Application Default
Credentials or service accounts on GCE VMs.
Non-interactive
auth.
Explains how to set up a project when code must run without any user
interaction.
Note that bigrquery requests permission to modify your data; but it will
never do so unless you explicitly request it (e.g. by calling
bq_table_delete() or bq_table_upload()). Our Privacy
policy provides more
info.
Please note that the ‘bigrquery’ project is released with a Contributor
Code of Conduct. By
contributing to this project, you agree to abide by its terms.
bigrquery
The bigrquery package makes it easy to work with data stored in Google BigQuery by allowing you to query BigQuery tables and retrieve metadata about your projects, datasets, tables, and jobs. The bigrquery package provides three levels of abstraction on top of BigQuery:
The low-level API provides thin wrappers over the underlying REST API. All the low-level functions start with
bq_, and mostly have the formbq_noun_verb(). This level of abstraction is most appropriate if you’re familiar with the REST API and you want do something not supported in the higher-level APIs.The DBI interface wraps the low-level API and makes working with BigQuery like working with any other database system. This is most convenient layer if you want to execute SQL queries in BigQuery or upload smaller amounts (i.e. <100 MB) of data.
The dplyr interface lets you treat BigQuery tables as if they are in-memory data frames. This is the most convenient layer if you don’t want to write SQL, but instead want dbplyr to write it for you.
Installation
The current bigrquery release can be installed from CRAN:
The newest development release can be installed from GitHub:
Usage
Low-level API
DBI
dplyr
Important details
BigQuery account
To use bigrquery, you’ll need a BigQuery project. Fortunately, if you just want to play around with the BigQuery API, it’s easy to start with Google’s free public data and the BigQuery sandbox. This gives you some fun data to play with along with enough free compute (1 TB of queries & 10 GB of storage per month) to learn the ropes.
To get started, open https://console.cloud.google.com/bigquery and create a project. Make a note of the “Project ID” as you’ll use this as the
billingproject whenever you work with free sample data; and as theprojectwhen you work with your own data.Authentication and authorization
When using bigrquery interactively, you’ll be prompted to authorize bigrquery in the browser. You’ll be asked if you want to cache tokens for reuse in future sessions. For non-interactive usage, it is preferred to use a service account token, if possible. More places to learn about auth:
bigrquery::bq_auth().gargle::token_fetch(), which supports a variety of token flows. This article provides full details, such as how to take advantage of Application Default Credentials or service accounts on GCE VMs.Note that bigrquery requests permission to modify your data; but it will never do so unless you explicitly request it (e.g. by calling
bq_table_delete()orbq_table_upload()). Our Privacy policy provides more info.Useful links
Policies
Please note that the ‘bigrquery’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Privacy policy