Probably not. The majority of the code implements a high volume idiosyncratic
data pipeline on top of AWS services, and requires other services to work in
tandem with this. But, feel free to pillage activedata_etl/imports or
activedata_etl/transforms for the transformation code.
Branches
Many branches are meant as stable versions for each of the processes involved
in the ETL. Ideally, they would be unified, but library upgrades can cause
unique instability: deployment of a branch does not happen until (manual)
testing has been done.
Here are the important branches:
dev - unstable - primary branch for accepting changes
etl - stable - for ETL machines
primary - stable - for the “primary” and “coordinator” ES nodes
codecoverage - unstable - for Code Coverage ETL development
pulse-logger - stable - for the PulseLogger
tc-logger - stable - for the TaskCluster logger
push-to-es - stable - code installed on ES spot instance machines for
final indexing.
beta - stable - of all branches for testing on the beta machines
manager - stable - installed on the ActiveData management machine for cron jobs
master - unstable - intermittently updated to track dev, eventually
intended as the single-stable-version
Install pycrypto. Hopefully, voidspace still provides pre-compiled binaries. Knowing the internet, it probably moved by the time you read this, so I made a copy of pycrypto-2.6.win32-py2.7.exe
pip install fabric again. This should be successful.
Configuration Files
The configuration files, located in resources/settings, often point to a private.json config file outside the repository tree. This file holds the credentials and access info required, and looks something like this:
ActiveData-ETL
The ETL code responsible for filling ActiveData.
Sounds Exciting! Can I Use This?
Probably not. The majority of the code implements a high volume idiosyncratic data pipeline on top of AWS services, and requires other services to work in tandem with this. But, feel free to pillage
activedata_etl/importsoractivedata_etl/transformsfor the transformation code.Branches
Many branches are meant as stable versions for each of the processes involved in the ETL. Ideally, they would be unified, but library upgrades can cause unique instability: deployment of a branch does not happen until (manual) testing has been done.
Here are the important branches:
Requirements
Installing Fabric
It is 2016, and Python is still hard on Windows. It would be a nice question for Stack Overflow, but apparently not.
pip install fabric- There will be errorspip install fabricagain. This should be successful.Configuration Files
The configuration files, located in
resources/settings, often point to aprivate.jsonconfig file outside the repository tree. This file holds the credentials and access info required, and looks something like this:The exact properties will depend on the the resources you are accessing.