Building a Simple Data Pipeline with Google App Engine: Part 2

This is part 2 of a two-part series (part 1 is here)

Part 2 focusses on how I used the PHP SDK to connect to Cloud Storage and BigQuery, and how to set up cron jobs to make my app run automatically, and how to deploy your app. 

In the first part I explained how to setup and structure a PHP app for App Engine, how to manage authentication & environment variables for local development, validating endpoint calls come from App Engine Cron and, finally, logging and alerts. Now we’re going to dig into the meat of the app logic…

The Google PHP SDK

Google provides a PHP SDK which abstracts communication with Google Cloud via their REST API. The easiest way to add this to your project is with composer, this is what my composer.json file looks like:

Cloud Storage

I wrapped the Cloud Storage SDK in the Bucket object below:

The constructor instantiates the StorageClient (remember it will use the credentials per the hierarchy described in part 1) and sets the bucket object.

The writeStringToGcs() method takes a string and writes it to a new object in the bucket. Let’s step through this:

  • First we have to write the string to a temporary file on the App Engine server. App Engine has its own implementation of the PHP tempnam function, which will write the file a temporary disk space.
  • Then we use fopen to open the file stream for reading.
  • And finally using the Bucket object’s upload function we upload the file to an object on the bucket. I’m using the predefine projectPrivate ACL to make the file access reflect users’ access to the GCP project.

BigQuery

In the same way, I wrapped the BigQuery SDK in a BigQuery class:

Let’s take a closer look at the importFile method:

  • First we get the table object.
  • Then we create the job using the loadFromStorage function. The source format is NEWLINE_DELIMITED_JSON and we want to WRITE_APPEND to the table (ie add records, but not touch the existing ones).
  • We then poll the job to check that it completed, throwing an exception if it times out.
  • Finally we check it succeeded, again throwing an exception if it did not.

Scheduling the jobs

When using App Engine, Google Cloud offers 2 alternatives for job scheduling:

  1. Cloud Scheduler is a general purpose scheduler that can run jobs against a range of resources: any HTTP request, an App Engine HTTP request or by publishing a message to a Pub/Sub queue. Scheduling is configured using the standard UNIX Cron format.
  2. App Engine Cron is specific to App Engine and is configured in a YAML file within your project. I chose this option to keep everything together in one place, which should make for easier maintenance and for someone (my future self included) to understand what’s going on in the future.

Here is my cron.yaml file:

You can see it accepts only the path after the domain of the URL, as it will always run against the current app. The schedule is set using an English-like format, eg every day 00:01 will run daily at 1 minute past midnight UTC (you can also specify the timezone if required, but I wanted to use UTC as the Tempo timestamps are also in UTC). I’m unclear why App Engine Cron uses this English-like format instead of the more common UNIX Cron format used in Cloud Scheduler, but there you go.

Deploying

Once your app is ready to go, you can deploy it via the command line using the following command:

gcloud --project nolte-metrics app deploy app.yaml cron.yaml

This will deploy the files in the current folder to the app in the nolte-metrics project. You need to specify the app.yaml and cron.yaml files that you want to use. If all went well your dashboard should show something like this:

And the Cron jobs tab will show your jobs:

Putting it All Together

I’ve published the complete version of the app to github, feel to free to clone or copy and adapt to your needs.

You can reach me by commenting below or on Twitter, @adamf321, or email, adam@nolte.io. I’d love to hear your feedback, questions or suggestions for future articles.

How to measure and monitor site performance with Google’s Core Web Vitals

On may 5th, 2020 Google announced Web Vitals coined as “Essential metrics for a healthy site”. It’s evident that performance has a direct impact for businesses and this initiative from Google reinforces that. Being that our mission is to co-create high-quality software with our clients we definitely correlate performance with high-quality.

Why does it matter?

Performance is about human experience. It’s about how people experience the web. For the vast majority of people web is a key touchpoint their life. Performance can both positively or negatively impact how people feel. To illustrate why performance matters, I’ve pulled some data from the web to show both the positive and negative impact of performance.

15%

higher conversion on sites that loaded in 2s or less

10%

BBC lost 10% of users for every additional second of load time

43%

of consumers expect a web page to load in 2 seconds or less

15%

Pinterest’s increase in traffic and sign-ups when wait times reduced by 40%

User Centric Performance Metrics

Given performance is about people, we need a method to test performance based on peoples perception. The only way to truly know how your site performs for your users is to actually measure its performance as those users are loading and interacting with it. This type of measurment is known as RUM (Real User Management).

Noting that a single metric exists that can measure a sites performance based on a users perception there are several types of metrics:

  • Perceived load speed – how quickly a page can load and render all of the visual elements to the screen.
  • Load responsiveness – how quickly a page can load and execute any required code in order for it to respond quickly to a user
  • Runtime responsiveness – once a page is loaded, how quickly can the page respond to user interaction.
  • Visual stability – do elements on the page shift in ways that users don’t expect and potentially interfere with their interactions?
  • Smoothness – do transitions and animations render at a consistent frame rate and flow fluidly from one state to the next?

Metrics To Measure

Based on Google’s announcement of Web Vitals the following metrics are a set that they deep as most necessary when measuring user centric performance. Google has mentioned that these metrics will evolve over time.

LCP: Largest Contentful Paint (loading)
FID: First Input Delay (interactivity)
CLS: Cumulative Layout Shift (visual stability)

Definition & Target Speed

  • Largest Contentful Paint (LCP): measures loading performance. To provide a good user experience, LCP should occur within 2.5 seconds of when the page first starts loading.
  • First Input Delay (FID): measures interactivity. To provide a good user experience, pages should have a FID of less than 100 milliseconds.
  • Cumulative Layout Shift (CLS): measures visual stability. To provide a good user experience, pages should maintain a CLS of less than 0.1.

Field & Lab Measurement Methods

There are many instances where measuring performance from an actual user is not possible such as when a site is in testing or before moving into production. Given this fact, two methods for measurement have been established:

  • Lab – Before production, pre-release, local testing, etc
  • Field – Real User Monitoring, based on how the actual user experiences the performance

Toold for Field Measurement:

  • Page Speed Insights
  • Chrome UX Report
  • Search Console
  • Firebase Performance Monitoring

Tools for Lab Measurement:

  • Lighthouse
  • Chrome Dev Tools

Simply Measure Your Sites Performance

The simplest way to get started in measuring your sites performance is to use the Chrome User Experience Report in Google Data Studio. This report indexes all websites across the web into a BigQuery database which is publicly available online.

Setting up your dashboard:

  1. Go to Chrome UX Dash – the connector for the Google Data Studio Dashboard.
  2. Input your sites domain in the origin URL field
  3. Click next
  4. Choose create report

Chrome UX Dashboard for WeAreNolte.com

A talk I did during Nolte’s weekly Lightning talks on this topic.

Source: Think With Google, Google Developers:Why Performance Matters

Building a Simple Data Pipeline with Google App Engine: Part 1

This is part 1 of a two-part series

Part 1 focusses on the setup and structure of a PHP app for App Engine, how to manage authentication & environment variables for local development, validating endpoint calls come from App Engine Cron and, finally, logging and alerts.

Data is so important to modern business and ours is no different. For years we’ve worked towards being a data-driven business and making data-driven decisions for ourselves and our clients. But how do we efficiently process all this data? One thing I’ve learnt is that if the data is not readily available then any good intentions of building reports get lost in the day-to-day melee.

For a while now, we have been relying on Google Data Studio to build out Business Intelligence dashboards for ourselves and our customers. We use StitchData to load Jira, QuickBooks and HubSpot data into our BigQuery warehouse from where we can build custom views and reports in Data Studio. However, we were missing one vital dataset for which Stitch does not provide an integration: Tempo (time logs and plans). As I am taking the Google Cloud Architect certification I thought this would be a great opportunity to put into practice what I’d learn so far.

The Solution

I decided to use App Engine as it provides a server-less solution and would allow me to build in a familiar technology, PHP. The solution is based on Google’s sample app, using the Slim Framework and the logic outline is as follows:

  1. Hit the Tempo API to extract the required data
  2. Convert it to the NDJSON format expected by BigQuery
  3. Save the file to a Google Cloud Storage Bucket
  4. [Repeat steps 1-3 if multiple “pages” of data are returned by the API]
  5. Import each file into BigQuery

Let’s step through the solution:

Setup

I’m assuming you already have a Google Cloud project setup, if not do that first! Once you’ve done that you’ll need to initialise App Engine using the the CLI:

gcloud app create --project=[YOUR_PROJECT_ID]

Structure

The app is structured as follows:

src/
  controllers/
    main.php
    tempo.php
  BigQuery.php
  Bucket.php
  Logger.php
  Secrets.php
.gcloudignore
.gitignore
app.yaml
composer.json
cron.yaml
index.php
README.md

Most of this is standard PHP stuff, but there are a few special files required by App Engine:

  • .gcloudignore tells Google to ignore certain files when deploying to App Engine. By default it ignores your .git* files and vendor folder (not required because App Engine will run composer install for you when you deploy).
  • app.yaml contains app-specific information for App Engine. At a minimum you need to specify the runtime (PHP 7.3 in my case). I also use this file to specify environment variables – more on that later.
  • cron.yaml specifies the cron jobs I want App Engine to run. We’ll talk about that later too.

Authentication & Environment Variables for Local Development (and app.yaml)

In App Engine the app will run as the App Engine service account, which has access to Cloud Storage and BigQuery by default (if these resources are setup in the same project as your app). In a minute we’ll see how to grant access to the Secrets Manager too.

However, to run the app locally you will need to create a new service account for yourself with the required permissions on the Cloud Storage bucket, BigQuery and Secrets Manager. Make sure you download the JSON file with the credentials.

The Cloud SDK uses a hierarchy to determine the credentials used:

  1. Use the credentials supplied when an object is instantiated.
  2. Look for credentials in the GOOGLE_APPLICATION_CREDENTIALS environment variable.
  3. Use the credentials of the account under which the app is running.

In production we are using option 3, on local we are going to use option 2. We can use the following format to pass in environment variables when starting up the default PHP server:

GOOGLE_APPLICATION_CREDENTIALS=/Users/adam/Downloads/NolteMetrics-34cc803fcfc5.json \
   GC_PROJECT="nolte-metrics" \
   GC_BUCKET="nolte-metrics.appspot.com" \
   ALERT_EMAIL="afenton@wearenolte.com" \
   php -d variables_order=EGPCS -S localhost:8085 -t . index.php

Just replace the GOOGLE_APPLICATION_CREDENTIALS value with the path to your JSON file.

The other 3 variables, GC_PROJECT, GC_BUCKET and ALERT_EMAIL are read from the app.yaml file in App Engine, but for local use we need to define them ourselves. Here’s my complete app.yaml file for this project:

Secrets

I am using Google’s Secrets Manager to store my Tempo and PostMark authentication keys, these keys allow for complete control of our accounts and are not something we want to reveal in our source code (please don’t)!

In Secrets Manager you need to create the new secrets, give them each a name (I used TEMPO_AUTH_TOKEN and POSTMARK_AUTH_TOKEN) and enter the value. You will need to grant your local service account and App Engine’s service account, <project-name>@appspot.gserviceaccount.com, Secret Manager Secret Accessor permission on each secret individually. Once you’ve done this you can access them in your PHP code using the following function:

The Tempo Controller

I defined a GET controller which runs the job. The first thing I did is define an endpoint URL structure which will allow me to add more datasets in the future. The template of this structure is /[dataset]/[table]{/[optional_additional_params]}. Specifically for Tempo I used

/tempo/{table:worklogs|plans|accounts}[/{updatedFrom}]

Where:

  • tempo is the dataset name in BigQuery
  • table is either worklogs, plans or accounts which are the 3 tables I want to load from the Tempo API. The names match those used by Tempo and also the table names in BigQuery
  • updatedFrom is an optional parameter used for running the app locally (say to load all data going back a couple of years which I wanted to do the first time the job was run).

Validating Endpoint Calls come from App Engine Cron

The very first thing the controller does is validate the requester. If the app is running in App Engine then I only want App Engine Cron to be able to run it. If it is running locally then all good, anyone can run it. To check this:

More info on App Engine Cron header validation.

Logging and Alerts

My requirements are to have decent logs in the Google Cloud Logger in order to debug. And even more importantly to be alerted on job failure – the last thing we want is for some data to be missing days due to a silent failure.

The Google SDK provides a LoggingClient which we can easily use to write custom logs. However, Google does not provide a simple way of alerting on cron job failure. The best solution I could come up with was to use try/catch blocks to capture failures and send myself and email via PostMark so I can take action. It’s not perfect, but it should work (I hope)! Here’s the custom logger class I wrote to encapsulate this:

There are certainly a few things in this article which took me some time to figure out, so I hope this is helpful to you and maybe even saves you some time!

Read Part 2 here

In part 2 I explain how I used the PHP SDK to connect to Cloud Storage and BigQuery, and how to set up cron jobs to make it run automatically and how to deploy your app.

An Engineering Competency Matrix for a Digital Agency

Developing the skills of our engineers is a key element of achieving our vision. A high-quality software development practice requires highly skilled engineers. We also know the experience of our customers correlates to the abilities of our team (higher quality work, smoother process) and our company culture is built around learning (“aspire to greatness” is one of our values).

To be recognised as North America’s leader in high-quality software development, customer experience, and company culture.

The Nolte Vision

In summary great engineers = great work = happy team + happy clients.

The Matrix

The below table shows the skills required by Nolte engineers, these are very focussed on the technologies we use.

The last 4 columns show which skills are required for each of the product types we work on. This does not mean one single engineer needs to have all these skills to work on the project, but the project team as a whole will most likely need to be able to cover them all.

Usage

We run the company on a quarterly OKR cycle, which includes setting training objectives. Each engineer meets with their manager to identify the skills they need for their job. For each skill they evaluate their current level based on the matrix and assign a priority based on the need to improve the skill (high priority means it is urgent to improve). For example:

SkillCurrent LevelRequired LevelPriority
WordPressNone [little or no experience]IntermediateHigh [About to start a WordPress project]
PHPBeginnerIntermediateMedium
AnimationsNot requiredN/AN/A

Seniority

In the near future we plan to integrate this matrix into promotion and compensation discussions. A possible option is to set minimum requirements for each seniority level along these lines:

  • Senior: must be expert in at least 4 key competencies*.
  • Mid-level: must be expert in at least 1 key competency* and intermediate in at least 2 others.
  • Junior: does not yet meet the requirements of Mid-level or Senior.

* Todo: define key competencies.

Further Reading & Thanks

This matrix was inspired by the following articles:

Also a big shout out to all in the Nolte team who provided feedback along the way

Keeping Gutenberg’s Core Columns Block Aligned to the Grid.

The WordPress Gutenberg editor comes with a load of power to create custom layouts. However, as Spiderman said, with great power comes great responsiblity… so how do we reduce the risk of our meticulously designed site being messed up by an editor who is not design-savvy?

In this post we are going to focus on one aspect of this, the core column block. Our designs are normally based on a 12-column grid, however the column block allows the editor to enter any percentage width they like for each column. This means they can create columns which are 5% or even 98% wide, ignoring our grid completely. What a mess they could make!

In the Lean theme we came up with a simple solution for this – the columns are automatically snapped to the nearest width allowed on a 12-column grid (ie a multiple of 8.333%). Here’s how we did it:

First we hooked into the render_block filter and called functions we’ll use to add classes to all core/columns blocks (the wrapper block for a set of columns) and all core/column blocks (an individual column).

You’ll note we used this WordPress PHP hook instead of a hook from the JavaScript API. We felt this is better for maintainability, as the hook updates the block at render time only. The JavaScript hook updates the block data which is saved to the database, this means any further changes to the saved HTML could invalidate existing blocks meaning they have to be fixed or even re-created.

We only need to add two classes to the core/columns wrapper: md:flex -mx-1. Using TailWindCSS these apply the necessary CSS properties to set up a flex grid within the container (see the TailWind flex grid documentation):

For now we’re using strpos and substr_replace functions to manipulate the HTML string. We could of course use a PHP DOM parser library to simplify this further.

Now the real magic happens on the individual column blocks. On mobile we want them to be full width so the columns stack, and then switch the to column layout from the medium breakpoint upwards.

First we get the % width set in the editor from the $block properties. This can be null if no widths are set at all – in this case we assume the user wants the columns to auto-size, so we don’t add any width class at all and let flex auto-size them.

Then, assuming a width is set, we round it to the nearest factor of 12 / 100 to snap it to the closest width available on a 12 columns grid. In TailWind we use the w-1/12, w-2/12, … w-11/12 classes to achieve this.

Summary

This is definitely a quick fix, a better solution would be to customise the core/column block width controls (or replace the block with a completely custom one) so the user has to select some sort of “columns to span” amount. This is quite a bit more complex, as it involves building the logic to ensure the total is always 12, as well as some other probably gotchas.

I hope this was useful or at least a little interesting. You can contact me at afenton@wearenolte.com or via the comments section if you have any questions, comments or suggestions. Also check out these resources:

  • My Lightening Talk on Building a Gutenberg Enabled Theme for Non-designers.
  • The full gist of this code example.
  • A working example in the blocks section of the Lean Theme.

Slow Running Animations in Safari

I came across this problem whilst working on this CSS-only continuous ticker slider. It stumped me for a while and I couldn’t find any good information online, so I hope this is helpful.

The Problem

Checkout the animations below:

You can see the Safari version (on the right) runs much slower than specified in the CSS. On desktop, I found this occurs the 2nd or 3rd time the page is loaded after clearing the cache or making a change to the CSS animation property. On mobile it normally happened on the first page load.

The problem was tricky as it did not occur consistently. Some hunting around on the web gave me some clues that this was an optimisation, Safari is throttling animations which it determines as not important. The criteria are not clear, but it appears some sort of interaction with the element will decrease the risk of this happening.

The fact this happened more often on mobile reinforced this theory, I would expect any optimisations to be stricter on mobile devices which generally have less resources than laptops or desktops.

The Solution

After some trial and error, I found that activating the animation when the element scrolls into view fixed the problem. This makes some sense, as it’s more efficient to only animate an element when required and not as soon as the page loads (assuming the browser doesn’t optimise for this automatically). Some JavaScript is required for this:

import 'intersection-observer' // Polyfill for older browsers which haven't implemented the IntersectionObserver API.

/**
 * Function that triggers the image strip animation on scroll.
 */
export default function scroll () {
  const animTriggerClass = 'anim-image-strip-trigger'
  const animElems = document.getElementsByClassName( 'js-image-strip' )

  const io = new IntersectionObserver(
    entries => {
      entries.forEach( entry => {
        if ( entry.intersectionRatio > 0 ) {
          entry.target.classList.add( animTriggerClass ) 
          io.unobserve( entry.target )
        }
      })
      
    }
  )

  animElems.forEach( element => {
    io.observe( element )
  })
}

Then update my CSS to trigger the animation when the anim-image-strip-trigger class is added:

.anim-image-strip-trigger {
    animation: ticker-kf 18s linear infinite;
}

In Summary

The problem has a relatively simple solution, the hard part was figuring out what the root cause was. It would be nice if Apple documented this along with guidelines for how animations should be used in Safari for optimal performance. It would also have been really helpful if they output a warning in the console instead of leaving engineers guessing!

A Continuous Image Ticker Built With Pure CSS

When working on a continuous image ticker slider for a new project I was surprised to find no JavaScript slider libraries which support this feature. After some back and forth with my designer on some alternatives, I decided to implement it as designed. In fact, it’s possible to do with pure CSS if you know the width of each item in the slider.

The Theory

Some important info:

  • There are 6 items in my ticker and each is 16rem wide with a space of 2rem between them. So in total the ticker width is (16 + 2) * 6 = 108rem.
  • I want 10 items to scroll by every 30 seconds, or in other words each item should scroll through in 3s. In this example I want 1 cycle of the ticker animation to last 3 * 6 = 18s.

Once the animation is complete it will leave a big white space on the right, which we don’t want. I got round this by duplicating the slider items in my PHP template. This works on the assumption that the total slider width is greater than or equal to its container, however if this doesn’t hold true you could duplicate the slides 3 or more times as required.

Now once the ticker has slid one full length I reset it back to the beginning to start again. If my maths is correct the reset will be seamless and the ticker will appear to be continuous!

In Practice

First let’s build the HTML. I’m using TailWind to set the layout (check out their flexbox grid documentation):

<!-- Wrap the slider in div with overflow hidden to hide scrollbars -->
<div class="overflow-hidden">

  <!-- The slider itself is a flex grid -->
  <div class="flex -mx-4 img-ticker">

    <!-- Original set of images -->
    <!-- Each image is a grid column with width 16rem (w-64) and horiztonal margin 1rem (mx-4) -->
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=1">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=2">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=3">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=4">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=5">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=6">

    <!-- Copy set of images -->
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=1">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=2">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=3">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=4">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x341/f4ccccff/434343?text=5">
    <img class="w-64 mx-4 self-start flex-none" src="https://via.placeholder.com/256x192/f4ccccff/434343?text=6">

  </div>
</div>

Note how the slider items are repeated twice.

Now we can create our CSS keyframes using the slide distance I calculated above:

@keyframes ticker-kf {
  0% {
    transform: translate3d(0, 0, 0);
  }

  100% {
    transform: translate3d(-108rem, 0, 0);
  }
}

This makes the slider move one full slider width to the right. Note I’m using translate3d as it has been proven to be the most efficient translation across all browsers.

Now let’s apply it using the following CSS:

.img-ticker {
  animation: ticker-kf 18s linear infinite;
}

This will execute the animation over 18s using a linear easing curve so the speed remains the same throughout. Finally we tell the browser to reset and repeat on an infinite loop.

Et voila! We have a pure CSS ticker!

As I did, you can go on to make this solution more flexible by using your templating engine to calculate and output the required keyframes and CSS animation property based on the number of items.

Check out the complete solution on codepen:

See the Pen Pure CSS Ticker Slider by Adam Fenton (@adamf321) on CodePen.

A Note on Design Fidelity

The original design has the image widths follow our fluid 12 column grid. This meant some complex calculations (which are not possible to do on the server-side) in the case where the slider stretched outside the container to the full viewport width. However, we realised the constant movement makes it impossible to see any misalignment, so I simplified the slider by compromising on 2 fixed widths (one for mobile and another for desktop).