Technology 22

DataProc - a (near) complete guide

DataProc - a (near) complete guide

This post is the summation of months of debugging and optimizing ML and AI workflows on Google Cloud DataProc clusters using Cloud Composer / Apache Airflow for orchestration.

If you would prefer to jump directly to a chapter:

  1. What is DataProc? And …

DataProc - Passing runtime variables

An explainer on how to pass runtime variables to Google Cloud DataProc for Python AI and ML applications.

DataProc - Cluster startup optimization

An explainer on how optimize Google Cloud DataProc startup for speed and cost with Python AI and ML workflows.

DataProc - Cluster configuration

An explainer on how to best set up Google Cloud DataProc environments for Python AI and ML workflows.

DataProc - Variables

An explainer on how variables can be inserted into the Google Cloud Composer and DataProc environments.

DataProc - Understanding environments: shared vs local

An explainer on how Google Cloud Composer and DataProc environments interact with each and share resources.

DataProc - Everything You Need to Know

A beginners guide to DataProc on Google Cloud and calling it from Composer

Dumpster diving & retrieval of Wordpress content

Backing things up

Xepr license work around

Danger

Disclaimer: the author does not condone piracy but this may be used as a temporary measure

​As readers of my work-sided blog posts will be aware, I’ve had some recent problems with …

HTC Desire headphone jack fix

This morning on the way to work I got down to the road, plugged in my earphones to my phone and hit play on Spotify. Only for some horrible, tinny, quiet music to come out. It seemed that the phone just simply would not recognise that anything had …

Troubleshooting a non-connecting Bruker ELEXSYS spectrometer

Recently in the lab we’ve had some problems thanks to a helpful software update rendering it impossible to connect to a spectrometer. This troubleshooting provides a record of my steps to diagnosing the problem. Hopefully shortening the process …

Logarithmic backups on a linux server

This is the second part of the lab server setup guide. Mainly for my own memory purposes, but if you find it useful then so much the better.

Synchronization

As previously discussed all of the machines in the lab are mounted onto the server using NFS …

Lab server setup - Ubuntu 12.4

With a recent change of machines in the lab, I thought that I should update the backup server according to reflect the new machines. However, upon inspection the backup server OS hard drive had died long ago; but then what do you expect when you use …

Moving apps from the Android phone memory to SD card: HTC Desire

So I’ve had my HTC Desire now for approximately 2 years now and whilst it has served me well I’ve had a bug bear with it for a long time. Essentially the phone has only 148 MB of internal storage and after the basic Android OS has been …

Clone an (openSUSE) machine and make it functional

In the last week I’ve had some need to setup some new PCs to control some of our machines in the lab. The problem being that the control interface between machine and PC is quite complicated and uses some pretty niche software, on the openSUSE …

Getting an openSUSE machine online

This week I’ve been cloning machines, but each new machine has slightly different hardware, even if it’s just a MAC address or serial number.

So, a quick disclaimer I’m using openSUSE 11.3 and so things will be slightly different …

PyMOL with Ubuntu 11.10 rendering problems

Recently my (Ubuntu 11.10 64bit) machine stopped playing nice with PyMOL and whenever I tried to render or move an object I was facing a >40 second render time, while both of my CPU cores jumped to >90%

When I paid close attention to the …

Rosetta 3.2 with Ubuntu 11.04

Rosetta is a very useful program for the prediction of protein structure, folding and interactions be that docking with proteins or ligands.

However, the current version of Rosetta (3.2.1) uses SCons to compile itself on your system. This is usually …

Ubuntu 11.04 and MATLAB x64 C, C++ and Fortran libraries

MatLab requires the use of libraries (think translation dictionaries) to run code that is outside of it’s native realm (a hybrid C).

The most common of computing languages are those of C and Fortran (an old machine code).

The problem is that …

Recovering a RAID1 disk from a corrupt array

I recently had our backup server at work die. Upon investigation the Kubuntu 8 server had had it’s OS drive and one of the RAID1 data disks corrupted to such an extent it couldn’t boot. The bad data disk was so bad that it couldn’t even be assigned a …

Western Digital Green Drives (WD20EARS) and spin downs

Recently I bought myself 2 new Western Digital (WD) drives for my home NAS box. Doing the good thing for the planet and not wanting massive electricity bills, I thought I’d try the recommend Western Digital Green Drives. Drives that are touted …

[Geek-post] Why I choose open source

I’m currently using an awful lot of software and getting to the stage where I’m having to develop my own.  For this reason I thought I’d keep a little track of what I’m using and why. I’m in the process of moving all of …