Remember the Network

featureImage: src: ‘/cables.jpg’ alt: Colorful network cables seo: image: src: ‘/cables.jpg’

Supercomputers have always fascinated me—the rooms filled with computer racks, forming a hive mind of silicon. My attention was always drawn to this wealth of processing power. I later learned one if the biggest challenges was contained in the neatly arranged network cables. Converting data between internal representations and transportable formats, (aka serialization), and moving bits over the wire can be a real speed challenge in a system. In our modern cloud compute world, we haven’t escaped this.

I am old enough to remember the surprise I felt that my Windows PC could talk to Linux computers without some operating system glitch. Agreed to protocols, and data serialization are key. The process of converting data into a format suitable for transmission consumes valuable time and resources. We play this balancing act of converting and compressing information just enough that it wasn’t too hard and didn’t take too long translating data, while also keeping as few bits on the wire as possible. If you think about 4K video streaming, we’ve come a long way. And yet as developers we get sloppy because it’s not our focus.

It became my focus when I was working with a team on a health data project. Huge volumes of data streaming in from devices. Every stream needed processing with heuristics, algorithms, and a hint of machine learning. We did the obvious thing. We stored the data, triggered processing, produced new values, and saved the results, sent them to the user. It sounds fine but let’s expand that. We received the values and converted them to storage. We retrieved the raw data and computed new values and stored it. We retrieved new summary results, and sent it back to the user. Going deeper, we even had multiple compute blocks of different type, retrieving and storing. This process looked like a streaming design in the simplified diagrams but expanding it we could see the huge amounts of moving data back and forth. It was expensive in terms of latency and dollars.

So how to fix? There were plenty of ideas, all well documented around the internet. Reduce the hops. Move algorithm processing into groups of compute working on the same data retrieved set. Create algorithms that could use previously computed data to reduce the amount of raw data retrieval. The implementations were not all easy, but the main thing was to identify what was hiding in the simplified architecture diagrams. Also, to think holistically. We have a reflex to build teams around services and build new services around teams. If you don’t look at the big picture, you might not notice the wasteful connection points you have built from organically aligning to an org structure.

As a side note, this learning applies to people too. One of our biggest challenges is the human to human exchange of information. Documents, Email, Slack. Watch a company grow, and the parallels appear.