Building a Simple Data Pipeline with Google App Engine: Part 1

This is part 1 of a two-part series

Part 1 focusses on the setup and structure of a PHP app for App Engine, how to manage authentication & environment variables for local development, validating endpoint calls come from App Engine Cron and, finally, logging and alerts.

Data is so important to modern business and ours is no different. For years we’ve worked towards being a data-driven business and making data-driven decisions for ourselves and our clients. But how do we efficiently process all this data? One thing I’ve learnt is that if the data is not readily available then any good intentions of building reports get lost in the day-to-day melee.

For a while now, we have been relying on Google Data Studio to build out Business Intelligence dashboards for ourselves and our customers. We use StitchData to load Jira, QuickBooks and HubSpot data into our BigQuery warehouse from where we can build custom views and reports in Data Studio. However, we were missing one vital dataset for which Stitch does not provide an integration: Tempo (time logs and plans). As I am taking the Google Cloud Architect certification I thought this would be a great opportunity to put into practice what I’d learn so far.

The Solution

I decided to use App Engine as it provides a server-less solution and would allow me to build in a familiar technology, PHP. The solution is based on Google’s sample app, using the Slim Framework and the logic outline is as follows:

  1. Hit the Tempo API to extract the required data
  2. Convert it to the NDJSON format expected by BigQuery
  3. Save the file to a Google Cloud Storage Bucket
  4. [Repeat steps 1-3 if multiple “pages” of data are returned by the API]
  5. Import each file into BigQuery

Let’s step through the solution:

Setup

I’m assuming you already have a Google Cloud project setup, if not do that first! Once you’ve done that you’ll need to initialise App Engine using the the CLI:

gcloud app create --project=[YOUR_PROJECT_ID]

Structure

The app is structured as follows:

src/
  controllers/
    main.php
    tempo.php
  BigQuery.php
  Bucket.php
  Logger.php
  Secrets.php
.gcloudignore
.gitignore
app.yaml
composer.json
cron.yaml
index.php
README.md

Most of this is standard PHP stuff, but there are a few special files required by App Engine:

  • .gcloudignore tells Google to ignore certain files when deploying to App Engine. By default it ignores your .git* files and vendor folder (not required because App Engine will run composer install for you when you deploy).
  • app.yaml contains app-specific information for App Engine. At a minimum you need to specify the runtime (PHP 7.3 in my case). I also use this file to specify environment variables – more on that later.
  • cron.yaml specifies the cron jobs I want App Engine to run. We’ll talk about that later too.

Authentication & Environment Variables for Local Development (and app.yaml)

In App Engine the app will run as the App Engine service account, which has access to Cloud Storage and BigQuery by default (if these resources are setup in the same project as your app). In a minute we’ll see how to grant access to the Secrets Manager too.

However, to run the app locally you will need to create a new service account for yourself with the required permissions on the Cloud Storage bucket, BigQuery and Secrets Manager. Make sure you download the JSON file with the credentials.

The Cloud SDK uses a hierarchy to determine the credentials used:

  1. Use the credentials supplied when an object is instantiated.
  2. Look for credentials in the GOOGLE_APPLICATION_CREDENTIALS environment variable.
  3. Use the credentials of the account under which the app is running.

In production we are using option 3, on local we are going to use option 2. We can use the following format to pass in environment variables when starting up the default PHP server:

GOOGLE_APPLICATION_CREDENTIALS=/Users/adam/Downloads/NolteMetrics-34cc803fcfc5.json \
   GC_PROJECT="nolte-metrics" \
   GC_BUCKET="nolte-metrics.appspot.com" \
   ALERT_EMAIL="afenton@wearenolte.com" \
   php -d variables_order=EGPCS -S localhost:8085 -t . index.php

Just replace the GOOGLE_APPLICATION_CREDENTIALS value with the path to your JSON file.

The other 3 variables, GC_PROJECT, GC_BUCKET and ALERT_EMAIL are read from the app.yaml file in App Engine, but for local use we need to define them ourselves. Here’s my complete app.yaml file for this project:

Secrets

I am using Google’s Secrets Manager to store my Tempo and PostMark authentication keys, these keys allow for complete control of our accounts and are not something we want to reveal in our source code (please don’t)!

In Secrets Manager you need to create the new secrets, give them each a name (I used TEMPO_AUTH_TOKEN and POSTMARK_AUTH_TOKEN) and enter the value. You will need to grant your local service account and App Engine’s service account, <project-name>@appspot.gserviceaccount.com, Secret Manager Secret Accessor permission on each secret individually. Once you’ve done this you can access them in your PHP code using the following function:

The Tempo Controller

I defined a GET controller which runs the job. The first thing I did is define an endpoint URL structure which will allow me to add more datasets in the future. The template of this structure is /[dataset]/[table]{/[optional_additional_params]}. Specifically for Tempo I used

/tempo/{table:worklogs|plans|accounts}[/{updatedFrom}]

Where:

  • tempo is the dataset name in BigQuery
  • table is either worklogs, plans or accounts which are the 3 tables I want to load from the Tempo API. The names match those used by Tempo and also the table names in BigQuery
  • updatedFrom is an optional parameter used for running the app locally (say to load all data going back a couple of years which I wanted to do the first time the job was run).

Validating Endpoint Calls come from App Engine Cron

The very first thing the controller does is validate the requester. If the app is running in App Engine then I only want App Engine Cron to be able to run it. If it is running locally then all good, anyone can run it. To check this:

More info on App Engine Cron header validation.

Logging and Alerts

My requirements are to have decent logs in the Google Cloud Logger in order to debug. And even more importantly to be alerted on job failure – the last thing we want is for some data to be missing days due to a silent failure.

The Google SDK provides a LoggingClient which we can easily use to write custom logs. However, Google does not provide a simple way of alerting on cron job failure. The best solution I could come up with was to use try/catch blocks to capture failures and send myself and email via PostMark so I can take action. It’s not perfect, but it should work (I hope)! Here’s the custom logger class I wrote to encapsulate this:

There are certainly a few things in this article which took me some time to figure out, so I hope this is helpful to you and maybe even saves you some time!

Read Part 2 here

In part 2 I explain how I used the PHP SDK to connect to Cloud Storage and BigQuery, and how to set up cron jobs to make it run automatically and how to deploy your app.

An Engineering Competency Matrix for a Digital Agency

Developing the skills of our engineers is a key element of achieving our vision. A high-quality software development practice requires highly skilled engineers. We also know the experience of our customers correlates to the abilities of our team (higher quality work, smoother process) and our company culture is built around learning (“aspire to greatness” is one of our values).

To be recognised as North America’s leader in high-quality software development, customer experience, and company culture.

The Nolte Vision

In summary great engineers = great work = happy team + happy clients.

The Matrix

The below table shows the skills required by Nolte engineers, these are very focussed on the technologies we use.

The last 4 columns show which skills are required for each of the product types we work on. This does not mean one single engineer needs to have all these skills to work on the project, but the project team as a whole will most likely need to be able to cover them all.

Usage

We run the company on a quarterly OKR cycle, which includes setting training objectives. Each engineer meets with their manager to identify the skills they need for their job. For each skill they evaluate their current level based on the matrix and assign a priority based on the need to improve the skill (high priority means it is urgent to improve). For example:

SkillCurrent LevelRequired LevelPriority
WordPressNone [little or no experience]IntermediateHigh [About to start a WordPress project]
PHPBeginnerIntermediateMedium
AnimationsNot requiredN/AN/A

Seniority

In the near future we plan to integrate this matrix into promotion and compensation discussions. A possible option is to set minimum requirements for each seniority level along these lines:

  • Senior: must be expert in at least 4 key competencies*.
  • Mid-level: must be expert in at least 1 key competency* and intermediate in at least 2 others.
  • Junior: does not yet meet the requirements of Mid-level or Senior.

* Todo: define key competencies.

Further Reading & Thanks

This matrix was inspired by the following articles:

Also a big shout out to all in the Nolte team who provided feedback along the way

Keeping Gutenberg’s Core Columns Block Aligned to the Grid.

The WordPress Gutenberg editor comes with a load of power to create custom layouts. However, as Spiderman said, with great power comes great responsiblity… so how do we reduce the risk of our meticulously designed site being messed up by an editor who is not design-savvy?

In this post we are going to focus on one aspect of this, the core column block. Our designs are normally based on a 12-column grid, however the column block allows the editor to enter any percentage width they like for each column. This means they can create columns which are 5% or even 98% wide, ignoring our grid completely. What a mess they could make!

In the Lean theme we came up with a simple solution for this – the columns are automatically snapped to the nearest width allowed on a 12-column grid (ie a multiple of 8.333%). Here’s how we did it:

First we hooked into the render_block filter and called functions we’ll use to add classes to all core/columns blocks (the wrapper block for a set of columns) and all core/column blocks (an individual column).

You’ll note we used this WordPress PHP hook instead of a hook from the JavaScript API. We felt this is better for maintainability, as the hook updates the block at render time only. The JavaScript hook updates the block data which is saved to the database, this means any further changes to the saved HTML could invalidate existing blocks meaning they have to be fixed or even re-created.

We only need to add two classes to the core/columns wrapper: md:flex -mx-1. Using TailWindCSS these apply the necessary CSS properties to set up a flex grid within the container (see the TailWind flex grid documentation):

For now we’re using strpos and substr_replace functions to manipulate the HTML string. We could of course use a PHP DOM parser library to simplify this further.

Now the real magic happens on the individual column blocks. On mobile we want them to be full width so the columns stack, and then switch the to column layout from the medium breakpoint upwards.

First we get the % width set in the editor from the $block properties. This can be null if no widths are set at all – in this case we assume the user wants the columns to auto-size, so we don’t add any width class at all and let flex auto-size them.

Then, assuming a width is set, we round it to the nearest factor of 12 / 100 to snap it to the closest width available on a 12 columns grid. In TailWind we use the w-1/12, w-2/12, … w-11/12 classes to achieve this.

Summary

This is definitely a quick fix, a better solution would be to customise the core/column block width controls (or replace the block with a completely custom one) so the user has to select some sort of “columns to span” amount. This is quite a bit more complex, as it involves building the logic to ensure the total is always 12, as well as some other probably gotchas.

I hope this was useful or at least a little interesting. You can contact me at afenton@wearenolte.com or via the comments section if you have any questions, comments or suggestions. Also check out these resources:

  • My Lightening Talk on Building a Gutenberg Enabled Theme for Non-designers.
  • The full gist of this code example.
  • A working example in the blocks section of the Lean Theme.

Slow Running Animations in Safari

I came across this problem whilst working on this CSS-only continuous ticker slider. It stumped me for a while and I couldn’t find any good information online, so I hope this is helpful.

The Problem

Checkout the animations below:

You can see the Safari version (on the right) runs much slower than specified in the CSS. On desktop, I found this occurs the 2nd or 3rd time the page is loaded after clearing the cache or making a change to the CSS animation property. On mobile it normally happened on the first page load.

The problem was tricky as it did not occur consistently. Some hunting around on the web gave me some clues that this was an optimisation, Safari is throttling animations which it determines as not important. The criteria are not clear, but it appears some sort of interaction with the element will decrease the risk of this happening.

The fact this happened more often on mobile reinforced this theory, I would expect any optimisations to be stricter on mobile devices which generally have less resources than laptops or desktops.

The Solution

After some trial and error, I found that activating the animation when the element scrolls into view fixed the problem. This makes some sense, as it’s more efficient to only animate an element when required and not as soon as the page loads (assuming the browser doesn’t optimise for this automatically). Some JavaScript is required for this:

import 'intersection-observer' // Polyfill for older browsers which haven't implemented the IntersectionObserver API.

/**
 * Function that triggers the image strip animation on scroll.
 */
export default function scroll () {
  const animTriggerClass = 'anim-image-strip-trigger'
  const animElems = document.getElementsByClassName( 'js-image-strip' )

  const io = new IntersectionObserver(
    entries => {
      entries.forEach( entry => {
        if ( entry.intersectionRatio > 0 ) {
          entry.target.classList.add( animTriggerClass ) 
          io.unobserve( entry.target )
        }
      })
      
    }
  )

  animElems.forEach( element => {
    io.observe( element )
  })
}

Then update my CSS to trigger the animation when the anim-image-strip-trigger class is added:

.anim-image-strip-trigger {
    animation: ticker-kf 18s linear infinite;
}

In Summary

The problem has a relatively simple solution, the hard part was figuring out what the root cause was. It would be nice if Apple documented this along with guidelines for how animations should be used in Safari for optimal performance. It would also have been really helpful if they output a warning in the console instead of leaving engineers guessing!

A Continuous Image Ticker Built With Pure CSS

When working on a continuous image ticker slider for a new project I was surprised to find no JavaScript slider libraries which support this feature. After some back and forth with my designer on some alternatives, I decided to implement it as designed. In fact, it’s possible to do with pure CSS if you know the width of each item in the slider.

The Theory

Some important info:

  • There are 6 items in my ticker and each is 16rem wide with a space of 2rem between them. So in total the ticker width is (16 + 2) * 6 = 108rem.
  • I want 10 items to scroll by every 30 seconds, or in other words each item should scroll through in 3s. In this example I want 1 cycle of the ticker animation to last 3 * 6 = 18s.

Once the animation is complete it will leave a big white space on the right, which we don’t want. I got round this by duplicating the slider items in my PHP template. This works on the assumption that the total slider width is greater than or equal to its container, however if this doesn’t hold true you could duplicate the slides 3 or more times as required.

Now once the ticker has slid one full length I reset it back to the beginning to start again. If my maths is correct the reset will be seamless and the ticker will appear to be continuous!

In Practice

First let’s build the HTML. I’m using TailWind to set the layout (check out their flexbox grid documentation):

<!-- Wrap the slider in div with overflow hidden to hide scrollbars -->
<div class="overflow-hidden">

  <!-- The slider itself is a flex grid -->
  <div class="flex -mx-4 img-ticker">

    <!-- Original set of images -->
    <!-- Each image is a grid column with width 16rem (w-64) and horiztonal margin 1rem (mx-4) -->
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=1">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=2">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=3">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=4">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=5">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=6">

    <!-- Copy set of images -->
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=1">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=2">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=3">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=4">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=5">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=6">

  </div>
</div>

Note how the slider items are repeated twice.

Now we can create our CSS keyframes using the slide distance I calculated above:

@keyframes ticker-kf {
  0% {
    transform: translate3d(0, 0, 0);
  }

  100% {
    transform: translate3d(-108rem, 0, 0);
  }
}

This makes the slider move one full slider width to the right. Note I’m using translate3d as it has been proven to be the most efficient translation across all browsers.

Now let’s apply it using the following CSS:

.img-ticker {
  animation: ticker-kf 18s linear infinite;
}

This will execute the animation over 18s using a linear easing curve so the speed remains the same throughout. Finally we tell the browser to reset and repeat on an infinite loop.

Et voila! We have a pure CSS ticker!

As I did, you can go on to make this solution more flexible by using your templating engine to calculate and output the required keyframes and CSS animation property based on the number of items.

Check out the complete solution on codepen:

See the Pen Pure CSS Ticker Slider by Adam Fenton (@adamf321) on CodePen.

A Note on Design Fidelity

The original design has the image widths follow our fluid 12 column grid. This meant some complex calculations (which are not possible to do on the server-side) in the case where the slider stretched outside the container to the full viewport width. However, we realised the constant movement makes it impossible to see any misalignment, so I simplified the slider by compromising on 2 fixed widths (one for mobile and another for desktop).