This page last updated: 8 October 2018.
Back in 2017, at a Cambridge Wireless meeting, some colleagues of mine happened to be talking to Rob Morland, who is involved in something called the A1 Steam Locomotive Trust. The trust has built a brand new £3m steam locomotive, the Tornado, which runs on the national rail network. Rob was wondering whether it was possible to stream the live sound of the steam engine to those on the track side etc. waiting for it to come past. For some reason my colleagues pointed him at me and hence a project was born.The design is dictated by these requirements:
The architecture of the system looks something like this:
I'm a software engineer and so I wanted to, as far as
possible, avoid any potential issues with hardware
design. By far the simplest approach, especially since
there was an intention to show Internet Of Things behaviours,
is to use an I2S microphone such as the InvenSense
ICS43434. Not much bigger than a grain of rice, this
microphone can be powered from 3.6 to 1.8 Volts and
provides a completely standard Phillips format I2S digital
output that can be read by any microcontroller with an I2S
interface. Audio is 24 bit and capture rates can be
at least 44 kHz.
I experimented with various bit depths, capture frequencies
and coding schemes. From a capture point of view,
24 bit is somewhat high so I compromised at
16 bit. In terms of capture frequencies,
44 kHz is CD audio quality but that is going to be too
high for a cellular network and so I compromised at
16 kHz. With a raw PCM transport, ignoring
overheads, this would require a constant 256 kbits/s
uplink on the cellular interface. This is definitely on
the large side: cellular networks may offer links of this
bandwidth on the downlink, uplink is an entirely different
matter. However, I didn't want to go any lower than this
in quality terms and so the next variable is the audio coding
scheme.
While it would be theoretically possible to MP3 encode at
source, that is a processor intensive operation and MP3 is
neither stream oriented nor loss tolerant; it is coded in
blocks of 1152 samples and the audio content is
interleaved across many blocks so losing a single block has a
large effect.
Jonathan Perkins at work suggested I adopt a NICAM-like
approach. NICAM
was the first scheme used by the BBC for broadcasting digital
multi-channel audio at a controlled quality, allowing stereo
audio to be broadcast for the first time. It also
happens to be very suited to embedded systems. Basically
a chunk of samples are taken and the peak is worked out.
Then all the samples are shifted down so that every sample in
the block can fit into the desired NICAM bit-width. The
amount of shifting that was performed is included with the
coded block. At the far end the block is reconstructed;
any loss will always be in the lower bits of the block.
With a relatively short block the "gain window" moves such
that the loss is not noticable. I chose an 8 bit
NICAM width and a block duration of 1 ms
(16 samples). For a 16 kHz sampling rate this
results in an uplink rate of 132 kbits/s, which (by
experiment) is bearable.
In addition to the audio stream itself I borrowed from the
likes of RTP and included a sequence number, microsecond
timestamp and coding scheme indicator in the block header; I
called this URTP (u-blox Real Time Protocol).
I initially thought about using RTP or something similar but
I really did NOT want to have to write a mobile application
for this; it had to work out of the box with existing mobile
devices. The answer to this turns out to be HTTP
Live Streaming. This protocol, originally
developed by Apple, chops up an audio stream into segment
files each a few seconds long, which are MP3 encoded but with
a very specific header added so that the browser can
reconstruct them. There is then an index file which
lists the segments to the browser. No client application
is required, just a browser; the browsers of all Apple phones
include native HLS support while all Android phones and
desktop browsers can be supported with the marvellous hls.js.
In the original Internet Of Things plan I had assumed that
DTLS was going to be the security scheme of choice.
However, I experimented with sending uplink audio stream over
UDP and found that there were relatively significant losses,
several percent. Hence I decided that TCP was a better
bet for the audio stream. Then there's also the issue of
cellular networks: they sometimes perform deep packet
inspection and deny service to things they decide don't meet
their tariff model, they have quite active and unpredictable
firewalls and they don't allow incoming TCP connections (which
will be needed for control operations).
Jonathan came to my rescue again here with the answer to all
of these problems: SSH. SSH comes built into all Linux
platforms and allows the setting up of secure tunnels between
servers, even multi-hop, provided that you have an account on
each of the machines, which can be certificates based.
You generate an SSH key on the Raspberry Pi and then push it
to the server. The Raspberry Pi can then use SSH to set
up tunnels from its port X to port Y on the server and, also,
setup up tunnels in the reverse direction, from port A on the
server to port B on the Raspberry Pi. The tunnels are
secure, can be configured to include keep-alives and restarts,
and, should the private key on the Raspberry Pi ever be
exposed, the server can simply remove the public key from its
lists.
At the simplest level the server can include a certificate
so that a HTTPS connection is made but that doesn't answer the
problem of how permissions may be encoded for a paid-for
service. This needs some thinking.
There are a few sources of latency:
Hence, in the case where there are no cellular outages the
delay is largely dependent upon the duration of an MP3 segment
file plus some browser/HTTP behaviour uncertainty. By
experiment, with S set at 1 second (a 3.3 kbyte MP3
segment file, about the same length as the HLS index file in
fact), a best case end to end latency of around 3 seconds can
be achieved (tested using Chrome on a PC as the receiving
browser).
As soon as there is a cellular outage the effect is to
increase the latency, however testing (see below) has shown
that hls.js is sufficiently clever to re-sync the stream using
the timestamps inside each MP3 segment file so this is not an
issue.
For initial testing, the hardware consists of a Raspberry Pi
B+ (which I happened to have in my cupboard), a microphone on a
flexible strip evaluation board connected via a break-out board,
and a u-blox 2G/3G modem board from Hologram called the
Nova. A 2G/3G modem draws more current than the Pi can
provide (close to 3 Amps peak) and so I used a Y cable that
allows me to provide separate power to the modem while
testing. Then I moved all of this to a Pi Zero W since
that should have sufficient processing power but is smaller and
more robust. Here I used a USB hub with an Ethernet
connector built in as I wanted the flexibility of being able to
switch on/off an auxiliary network connection to take over from
cellular (and there's no physical switch to disable Wifi on the
Pi Zero W).
I used a Giff Gaff (Telefonica network) SIM: they offer an
unlimited pay-as-you-go data package for £20 per month, which
works out at about 3 pence per hour of audio streaming if
streaming constantly, bandwidth limiting to 384 kbits/s
from 8 a.m. to midnight after 9 Gbytes consumed, which
is still more throughput than I need.
|
|
The software comes in three parts (available on github with
comprehensive READMEs):
There is, of course, quite a lot of configuration required on
the Raspberry Pi side (setting up SSH tunnels etc.), all of
which is covered in the README.md.
In order to meet the requirement that the only control of the
recording device is power on/off, the Raspberry Pi is also
configured to run from a read-only file system, preventing
potential SD card corruption from a disorganised shut-down.
For testing I wanted to be free of the need for a power supply
and so I powered everything from a Tracer 22 Ah LiPo battery
that I had lying around for other purposes; this only needs
recharging every few days even under heavy use.
I also needed audio capabilities as follows:
I began by testing the stability of the connection and the
worst case latency in the following scenarios (where the early
rows took a few weeks of constant testing and debugging to get
right); testing began with hls.js version 0.9.1.
Scenario | ioc-client connectivity | Browsing device | Outcome | |||||
PC (Chrome 66.0.3359.139) |
Android Phone (Samsung Galaxy A3, Chrome 66.0.3359.126) |
Apple Phone | ||||||
Cellular | Ethernet | Ethernet | Wifi | Cellular | Wifi | Cellular | ||
Overnight
(8 hours)
constant streaming, ioc-client stationary. |
x | x | With 333 ms segment files HLS liveSyncDurationCount = 1 and liveMaxLatencyDurationCount = 3 the browser generally maintained a ~3 second latency over the period, though I have seen it extend up to 6 seconds on some occasions. | |||||
x | x | As above, though I think that
the latency was at ~6 seconds for more of the time. |
||||||
x |
x | With 333 ms segment files
the mobile browser failed to keep up: after some time,
handset dependent (maybe 20 to 40 minutes), the
browser started cancelling downloads because it perceived
that they would be out of date, resulting in gaps in the
buffered stream, which also fragmented browser
memory. Switching to 1 second segment files,
however, the stream was maintained with a ~3 second
latency; my suspicion is that the HTTP overhead was too
large on such short-duration fetches and the larger
segment file doesn't actually take any longer to
get. At this time I also switched to using the Openfresh
modified HLS (described
here) and, with the #EXT-X-FRESH-IS-COMING tag
added, this kept the maximum latency down to ~5 seconds. |
||||||
x |
x | As above: ~3 second delay
using 1 second segment files, sometimes falling back
to ~5 seconds. The browser shows 125 kbytes
downloaded per minute so, for 1 Mbyte of mobile data,
you get 8 minutes of listening time.
Interesting to compare this with the uplink data volume,
which, at 140 kbits/s, is uploading just over 1 Mbyte
per minute. A clear gain from the very complex
processing behind MP3. hls.js maintained the audio
stream down as far as E-GPRS coverage, recovering from
gaps in the stream without incurring delays, until the
issue reported here
was encountered. |
||||||
x | x |
As above. |
||||||
x |
x |
As above, interestingly
showing a similar issue with the break-up of audio at the
browser-end after an overnight run (hls.js not being used
in the case of Safari as it has native HLS support),
though in apparently good Wifi coverage. This was
with the kind help of my sister in south Wales, as I don't
possess an iPhone, so I am unable to testify as to the
nature of the breaking-up of the audio directly. |
Scenario |
Outcome |
Time to begin streaming from power-on in
good coverage conditions. |
Approximately 65 seconds; ioc-client LED
begins to flash at 0.5 Hz after about 15 seconds
(indicating boot), Hologram Nova blue LED goes solid blue
(indicating a data connection) at about 25 seconds,
ioc-client LED begins 2 Hz flashing (meant to
indicate network up) at about 55 seconds and
streaming begins 10 seconds after that. |
Time to begin streaming after powering-on
in a no service condition; power on with antenna uncrewed,
wait 60 seconds, screw antenna on. |
Hologram Nova blue LED goes solid blue
(indicating a data connection) within 10 to
20 seconds of screwing on the antenna and streaming
begins about 10 to 20 seconds after that. |
Dropping off the network; unscrew antenna,
check that streaming stops by watching the ioc-client LED
indicator, screw antenna back on again. |
Hologram Nova blue LED goes solid blue
(indicating a data connectoin) within 5 seconds of
screwing the antenna back into place and streaming begins
10 to 12 seconds after that. |
SSH tunnel outage; unscrew the antenna for
greater than 60 seconds. |
Streaming recovers within 30 to 60 seconds of screwing the antenna back into place again. |
This did the trick without noticeably affecting the quality of
the wanted audio.
Exciting! Made me feel like a kid, a kid who gets to
clamber on an enormous steam engine.
As of mid 2018 the Tornado was housed at the Nene Valley Railway, not far from me, for repairs. Rob Morland took my Chuff Box, improved it with some additional stand-offs to make sure the Pi was secured and the USB modem was not going to shake loose, mounted the low-profile high-gain antenna I had purchased on an aluminium bracket and fitted the lot to an aluminium plate which, in turn, mounts on a steel plate on top of the locomotive's tender.
On a very sunny Saturday, 30th June, Rob was there to install the thing and so I joined him test it and take some photographs.
Now the Nene Valley Railway is not that far from the A1 and we
found that there was a constant traffic rumble in the distance,
sufficiently strong that the gain of the system was never high
enough to hear the "clickety-click" of the DC to DC
converter. We tried shouting inside the cab and around the
tender but all we got over the chuff box stream was indistinct
mumbling; this is good, we don't want to overhear
conversations. There was a small amount of chuffing on the
Nene Valley railway, about 20 metres to the left of the "view
towards the chuffing-end" picture above, and this was definitely
audible but I think that until we get some chuffing that is not
near a main road we won't be able to tell how the system really
behaves.
While I was there, Rob gave me a tour of the rest of the
engine and the support coach (which you can see on the left of
the "view towards the chuffing end" picture above).
There's a fiendish amount of electronics on both the loco and
the support coach: LED strip lighting, battery banks, sensors,
multiple safety systems and generators of various forms.
The Bluetooth temperature sensor in the cam that you can see
above has to stand more than 20 G of acceleration and continue
to provide readings without fail at all times. Quite a
thing.
The outcome of initial testing was not auspicious. First
of all, the track where the engine was test-steaming did not
appear to be well covered by the O2 network. Though the
device logs showed continuous attempts to make a connection, the
connection could not be established for large periods of
time. When it could be established no chuffs were audible,
which we think may be due to loud noises in the vicinity of the
tender causing the gain control to turn right down. We decided
on the following plan of action: