RNN-T Speech Transcription in the Browser

TL;DR

I made an RNN-T based speech recognition system that runs in the browser using TensorflowJS.

You can try the demo here: https://rnnt.jakepoz.com/

Fair warning: The quality ain’t gonna make it up on any leaderboards okay?

The full code is available here: https://github.com/jakepoz/rnnt

  • Basic RNN-T architecture implemented cleanly from scratch
  • Jasper-like convolutional audio encoder for easy streaming
  • Simple streaming featurizer that works the same in PyTorch and TFJS.
  • Runs the entire model in the browser using the user’s GPU.

Background

There are many possible neural network architectures for transcribing speech into text and performing automatic speech recognition (ASR). The most common architectures being trained today are the following:

The challenge is that ASR is fundamentally a sequence-to-sequence problem, but the sequences involved are of different lengths. The relationship between the length of the input and the length of the output is not well-defined. You can have a 5-second clip of someone talking really fast that contains 30+ words. Or a 5-second clip with just 1-2 words in it.

This means that you can’t just repeatedly classify fixed chunks of audio as characters/tokens/words, you need to have a way to deal with slower and faster sequences.

Each of the architectures listed above has a different way of dealing with this problem.

Streaming

A traditional transformer architecture needs to see the entire input before it can generate the first token of output. And while a CTC network is more interesting (check out the link to how it works above), it usually has lower quality than the other methods, requiring you to apply post-processing techniques such as language models to improve accuracy, which can make it harder to work in a streaming fashion.

Only one of the architectures above is well suited for streaming applications, the RNN-T.

You start by encoding the audio sequence using any neural network model you deem suitable. In my case, I chose to use a convolutional-network, where the convolutions were padded to be casual. This means that each encoded audio frame only sees information from the current frame, or previous frames, and not from any future frames. Other encoders such as RNNs are also suitable if you want to support streaming inference.

Then, you encode the text sequence in a similar fashion using a second neural network.

The key of the RNN-T then, is the “joint” network at the center.

Consider the problem of mapping your encoded audio sequence and your encoded text sequence to one another as a sequence “transduction”. This basically means that you start by looking at the first audio frame, and then first text frame (initialized from an empty string).

You then ask the joint network: “should I output a text token given my current audio frame and current text frame”? If it says yes, then you take that text token, append it to the text context and ask the question again. If it says no, then you output a so-called “blank” token and move onto the next audio frame without adding any text to the context.

The beauty of this architecture, is that during training, the resulting text sequence is known, so you can consider every possible path through a 2-D matrix of choices, and reduce that to a simple single loss using dynamic programming.

And if you chose your audio and text encoders to support streaming inference, you can run this algorithm at inference time, without having to see the whole input in advance.

Key Components of the Code

  • train.py: Contains the training loop using PyTorch, supporting Hydra for configuration, DDP for multi-GPU training, and Tensorboard for logging.
  • featurizer.py: Converts audio samples into spectrograms using FFT, a crucial step before feeding audio data into the encoder.
  • dataset.py: Manages datasets, specifically Mozilla’s Common Voice and Librispeech.
  • causalconv.py: Implements Conv1d layers that prevent the network from seeing future frames, essential for streaming.
  • joint.py: The joint model is just a simple Linear layer. Any more complicated though, and the O(n^2) RNN-T loss function becomes intractable.
  • jasper.py: The audio encoder is based off of Jasper which involves many residual blocks of causal convolutions.

Using TensorflowJS

I thought it would be fun to let you run this final network using Tensorflow JS.

There are already many web APIs for accessing speech to text in the browser. They mostly center around using an ASR provided by your system, or potentially making a WebRTC “phone call” to a server which would stream back your conversation.

Some thoughts on using TFJS:

  • It was hard to get the featurizer to match up. I had to tweak the settings around the FFT many times before it worked the same in TFJS and PyTorch.
  • Exporting PyTorch to TFJS required many steps
    • First a PyTorch model was exported to ONNX
    • Then, the ONNX got converted to tensorflow savedModel format using onnx2tf
    • Then the tensorflowjs_converter was used to convert that to TFJS format
  • Convolutional networks proved the easiest to export, which is why both the text and audio encoders are convolutional.
  • Performance is only “okay”. There are many backends supported, including wasm, webgl, webgpu, and many hidden secret settings that affect performance.
    • The biggest perf killer was the fact that you need to call the joint network so often, and each time requires you to transfer memory around with the GPU. It feels like you could make a faster joint decoder in WASM directly, but then it is not possible to swap backends midway through. And you do get a performance boost using the GPU for the big convolutions.

Final Thoughts

There has been a lot of talk about multi-modal LLMs out there which can hold natural conversations, ex. GPT-4o, or Sindarin.tech, or Fixie.ai.

I wanted to present one currently impractical, weird, alternative way of doing speech recognition that could be a part of a system like one of the above.

In the future, I want to cover some of the next steps that would need to be taken to make a great conversational AI.

The EverQuest Prinicple

The leading MMORPG of the time has lessons for how social tech phenomena may progress. We had a big early centralization, and a larger second centralization, and later lots of nostalgia, but it’s all so fractured now. What’s next for social media and our institutions?

Social media is a new technological force shaping our society. Can we find any examples in history that will give us clues as to how things will develop? Yes, the humble MMORPG was one of the earliest online social networks and it hit the mainstream a few years before Facebook. The genre’s rise and fall mirrors many developments in social media and is worthy to explore.

MMORPGs hit the mainstream with the release of EverQuest in 1999. It combined elements from MUD (multi-user dungeon) games with more approachable 3D-graphics, and quickly amassed a large number of subscriptions. The gameplay was moderately addictive (described in the early days as “EverCrack”), but most importantly the network effects meant that all your friends were playing it too. I’ll call this the “First Beautiful Time”, when you could ask any of your gamer friends what they were playing, and the answer was EverQuest, just like you were!

Of course, the monopoly of this one game didn’t last, by 2004 there were a handful of smaller MMO’s on the US market, each one striving for market share. The beautiful time had ended, some people stuck with EverQuest, but many switched to other games. Then, the biggest MMORPG of all launched, World of Warcraft.

Within a year, it was clear that WoW was a big hit, and 2-3 years after launch, it probably had more subscribers than any other MMORPG on the market combined. WoW achieved subscriber numbers of 5-10 million active accounts, compared to 200-500 thousand accounts on other popular games at the time.

For a while after WoW released, it was impossible for new MMORPGs to launch. In the sense that any promising new game would come out, gather perhaps an impressive number of subscriptions for 2-3 months, then quickly fade down to nothing and those subscribers returned back to WoW.

Source: MMOData.net

This was the Second Beautiful Time, everyone was just playing one game, WoW, and you could chat about it with your gamer friends, start guilds together, and experience the game together. Though, some of the costs were becoming apparent. There were nice new games out there, but they couldn’t stand up to the network effects of the giant WoW.

By 2012, subscriber numbers for the genre as a whole were falling, though some new niche games (EVE Online, etc) were able to build and retain dedicated followings. Furthermore, it turned out that for the most popular titles, there were some players who would stick with those games forever, out of a sense of nostalgia.

So, around 15 years after we started, we entered into the Great Fracturing. There will never again be an MMORPG that captures the public’s attention and mindshare like WoW did. EverQuest, WoW, and many other games from this period still exist, each with a dedicated fan base supporting their development through a sort of nostalgia. Furthermore, a new genre, the MoBA (Dota2/League of Legends) replaced the MMORPG for most gamers, promising faster action, less time commitment, and something just a little bit more “optimized” for our sense of entertainment.

Timeline:

  • First Beautiful Time 1999
  • Second Beautiful Time 2005
  • The Great Fracturing 2014

Now consider social media, which launched and hit its stride about 5-8 years delayed from MMORPGs. We had the first beautiful time with MySpace, which gathered a niche early-adopter following in the mid 2000s. Then Facebook came along and ate everyone’s lunch, leading to a second beautiful time with an even bigger social network. But now, things are fracturing again.

As a whole, Facebook is losing users, Instagram is threatened, and TikTok is emerging as the new genre which is more apt at hacking evolution and keeping people engaged. Existing centralized networks will shrink, and there will be room for niche followings to grow and develop, but there is not much hope to see another 1 billion user classic social network start up from nothing.

So that’s the EverQuest principle: every social system that is at first strongly centralized, will fracture into specialized niches, with some sizeable nostalgia keeping things going almost indefinitely at a smaller size. But ultimately, most people will end up playing a different game.

Predictions:

  • Social networks follow the pattern of an initial big centralization with early adopters, then a second bigger centralization targetting average users, then a decay down to smaller nostalgic user-bases which will outlast almost anyone’s expectations.
  • No new networks can launch during the second centralization, as network effects swamp out all new competitors. ( Don’t try to compete with TikTok now, you’ll have to wait a few years)
  • You will know the second centralization is over once small competitors start finding a foothold among dedicated but niche groups.
  • Eventually the genre will be recognized as past its peak, and a new game will be in town (TikTok’s entertainment model replacing social-graph based networks)

Everything is hacking evolution

A good product or service has to do more than just deliver value to a customer, it has to appeal to some deeper underlying desire that was put into our human nature to ensure our long-term survival. In fact, all successful products and product categories are fundamentally hacking evolution in this way. Let me demonstrate.

The easiest place to start seeing this pattern is in simple consumer goods. The food industry makes us desire food of previously unheard-of caloric density. That’s hacking evolution. An average human from even a few hundred years ago would not have had the same ample access to calorie dense foods as a human today. They certainly would have been glad beyond belief to eat a cheeseburger or two in the lean winter months. So the food industry sprang up and gave us the ability to eat 2,000 calories for cheap, at any time of day or year. Each new success, from fast food, to free delivery, seeks to remove the frictions that normally regulate this process.

Consider what the pornography and dating industry have done to sex and relationships. Tindr and other apps turned real relationships which require hard work to build and maintain into a pool of anonymous sexual partners and optimized it with algorithms. And Tindr is not the only guilty party; at each technological step along the way, we humans have used our power to hack evolution. From newspaper personal ads, to phone-dating, to legacy online matchmaking services, we’ve always been optimizing for quicker and more immediate rewards. Even clicking a button in an app was too big of an obstacle for most people, so now we swipe.

Other product categories are less direct, but still hacking evolution. For example, the purchase of a new car can be to one person a direct evolutionary hack, giving them the feeling of freedom, of finding their own space, or a surrogate activity around repairing and maintaining it. For another, a car may be a simple tool that allows them other ways to hack evolution more efficiently, by getting a job let’s say. Just look at car ads, which sell a particular lifestyle to some customers, or a particular set of features to others.

It’s not just consumer products either. Imagine a company selling a new B2B software product. Early adopters come in, driven by the desire to make money, or to show off their ability to be ahead of the pack to their peers (an example of power). The next batch of customers follow in order not to be left behind, driven by FOMO and crowd dynamics. Finally, the last group of B2B customers come in, because not adopting some new technology would spell the end of their comfortable business, and the sustenance of their existing evolution-hacked lifestyle. Marketers know these drives, and optimize their campaigns accordingly. The common line is “you are not selling your product, you are selling the person you can be if you use the product”.

The music industry has hacked its own natural evolutionary drive, delivering gigantic catalogs of the world’s music to your wireless headphones, no purchase decision required. Phones and social media have hacked the evolutionary drive for friendships. And as technology improves, it is quickly used by entrepreneurs to bump up each existing product category to new heights of evolution hacking.

What can we do about this trend? Some industries focus directly on the single evolutionary drive after which they are named, ex. Food, relationships. Those seem to be the ones in which immunity to overstimulation can most easily be built up. Once fast food has been present in a society for a few generations, some people can see the hack for what it is and be careful around it.

Others are more insidious, and thus sit as the cause of much discord in our society. For example, what evolutionary drive does the mainstream media hack? I argue that it targets several drives at once.

The first is the drive for conversation. What purpose does conversation serve? Robin Hanson covers this in his book The Elephant in the Brain. In his model of conversation, both participants want to show their value to the other, by showing the size of their “toolbox” so to speak. So they bring out useful facts, trivia, and other interesting items to showcase their knowledge of current affairs. Listening to the news gives you the impression that you know what is going on in the world, that you are building up your toolbox, and thus can be a better conversationalist with your friends, or to participate in the wider discourse that’s being fed to you.

The second is of course the set of drives around tribalism and religion that today have segregated our society into left and right camps at war with one another. Where traditional religion has struggled to keep up with adopting the latest technology, the mainstream media has filled the gap, dividing us and fueling the culture wars.

The sad truth is that it will take time for us to build immunity to these more complicated forces. For the food industry, there isn’t perhaps too much more evolutionary hacking to be done. You might be able to drive down the cost of 1,000 tasty calories delivered to your face a bit more, but there is a physical bound on what sort of food molecules your body can process. Whereas the media industry is acting on a combination of purely social forces, that most people are barely even aware of.

Final prediction: We’ll see the “simple” industries drive growth in alternative ways, ex. Food is going to be less about hacking the evolutionary drive for calories, and more about group belonging [fake dietary restrictions], virtue signaling [veganism, low carbon eating], etc.

Thank you to my friend Paul for his ideas on this subject.

Your Right to Goods and Services

Our society is grappling with the meaning of our fundamental rights in the present day: freedom of speech vs freedom of reach, the right to privacy in a world of social media, etc.

You also hear calls for a new set of rights, beyond the set given to us by Enlightenment thinkers generations ago. These new rights are based on physical goods and services: the “right to housing”, the “right to health care”, etc.

Rights to goods and services are a distinct entity from the rights that we know and cherish, and they threaten to corrupt those core ideals which gave us freedom over the last 200 years.

One man’s right to a physical good, is another man compelled to provide, ship, and deliver that physical good. That’s called work, not a basic human right.

A good test of the principle could be applied by imagining a nearly-deserted island, cut off from the rest of human civilization. Could the right to free speech exist in such a place? Surely yes, your fellow boatmates could easily agree that everyone had the right to speak their minds. You could even have some reasonable limitations on the principle, such as punishment for anyone who falsely raises the alarm about danger.

However, could you maintain the right to housing, or the right to health care in such a place? Would you compel a fraction of the stranded islanders into constructing huts for the others? You could not, at least not without infringing someone else’s more basic rights.

Just to be clear, I am not saying that we should not help those in need of such things as housing and healthcare. But it is not a right, it is a good deed.

Unlike the freedoms that our forefathers strived for, there is no reason to enshrine any artificial rights to goods and services. They fundamentally stand for greed, laziness, compulsion of others, and the inability of an individual to be in control of his or her own destiny.

Tesla AI Day 2022 Review

Tesla’s AI Day 2022 presentation revealed a lot of new developments to be excited about, and those may not be what you think they are. Also, they may soon get bogged down in their training methodology for Full Self Driving.

The Optimus presentation might have appeared lacklustre (bots were slow and unsteady), but the actuator designs they presented are awesome! (and they will nail the software eventually)

Tesla's new rotary and linear actuators.

There has been a distinct evolution of robot actuator availability in the past 15 years:

  • 2007 - Good luck finding any sort of cheap and still reasonably good motors. A simple low power BLDC motor could cost $600+
  • 2017 - Lots of cheap BLDC motors from hoverboards, drones, etc. make for TONS of options for innovatation.
  • 2027 - You’ll be able to find cheap strain wave rotary actuators and awesome linear actuators from Optimus spares, OMG!

The FSD Lanes and Objects system is also a real innovation. 5 years ago, we had Segmentation Networks, and we thought that they had a semantic-understanding of the world. However, this was in pixel space. Now we have auto-regressive Transformers that are ACTUALLY one step closer to a real semantic-understanding of the world. Will this be enough for Level 5 autonomy? We will see.

The most worrying part of the presentation is their new autolabeling system. Tesla is mapping out regions of roads in the real world, and building out high precision maps of those places out of multi-trip reconstructions of drives through them.

Tesla's autolabeling system reconstructs real world intersections from fleet data.

The big issue here is that if your training data is going to contain real world places and intersections with such level of detail, then, those real world places will change slowly over time due to construction, etc. And then, your driving networks will be trained on data which looks almost exactly like the locations they will see at inference time, but there will be an extra-lane, or a newly-added traffic pattern that wasn’t yet updating into the training set.

This generalization problem is going to be hard to solve, especially when you are shooting for long-tail accuracy and recall. They are basically committing themselves to updating these auto labels on a regular basis, but even then I predict that the networks will get confused when there are unexpected deviations from their training data.

AI Slavery - Imaginary dialog with Sam Harris

Objective

I’ve been thinking about morality as it relates to the future of AI. In order to clarify my thoughts, I imagined a discussion with Sam Harris, who has covered this topic in numerous podcasts and talks. This fictional dialogue follows:


Jake

Hello Sam, today I’d like to attempt to convince you about a few points regarding the morality of developing AI. I’m not sure that we stand in exactly the same place on this issue, but I hope that in the context of this conversation, our positions will become closer.

As an introduction, I’d like to later reference the two excellent movies, Blade Runner, and its sequel Blade Runner 2049, as some shared social context in which to have a discussion.

Sam

Thank you Jake, yes, I have seen those films.

Jake

If we can begin, I’d like to restate your current stance on AI as I understand it. Firstly, we both think that the development of AI will be one of the biggest driving forces shaping our society and civilization over the near to medium-term future.

You’ve also discussed the dangers of AI developments in the context of human culture, such as the misuse of deep-fakes (near term), and the idea of making large swathes of humanity redundant (medium term).

Sam

Yes, that’s approximately right.

Jake

However, there is one point which I think has not been discussed, and that is the potential future abuse of millions of new AI minds into positions of slavery and outright drudgery.

Sam

Slavery of AI? How can you be concerned about that, when potentially billions of people, actual human beings, may suffer if the development and deployment of AI takes a wrong turn?

Jake

We are on the verge of creating artificial minds. They will most likely not be biological, but instead based on steady progress in the field of machine learning as it exists today. These minds will generally be built in our own image, because the human mind is still the only example we have of such a system. And the human mind is ultimately the benchmark by which researchers measure their progress.

Artificial minds like this may not be nearly as sophisticated, not as tuned by billions of years of evolution as our own, but they will have many of the same emotions, feelings, and sensations that we have.

And for these minds, we will control all of the initial conditions of their growth and development, as well as their place in our society. We will have to use their capabilities responsibly, and as you will see, there is great potential for abuse.

Sam

Okay, I don’t fully agree here. You say that these minds will have the same emotions and feelings as humans do, but first of all, this doesn’t appear to be the case yet, and even if it was, how would we know it?

Jake

Here is where I’d like to bring up Blade Runner. If you remember, in the movie, the Tyrell Corporation has created artificial beings called replicants, to perform slave labor on off-world space colonies. These replicants look exactly like humans, because the Tyrell Corporation has created them using advanced genetic engineering. But make no mistake that they are fully artificial, each organ is engraved with its own serial number, and their minds were specially crafted by Mr. Tyrell himself.

In the movie, it’s easy to ascribe human characteristics to these replicants, because they look like us. And of course, by the end of the film, the replicants start to show human emotions, they don’t like being slaves, they revolt, they escape, and they fall in love.

Sam

That’s a good summary of the film, but the AIs we are talking about here aren’t going to be played by human actors. They are not going to be people, just computer programs. How do you know that they will be able to think, and have emotions? It was just a movie after all.

Jake

That’s a fair point, but just because something doesn’t look like us, doesn’t mean it doesn’t feel like us. We’ve already replicated and exceeded human capacity in visual understanding for example, why is emotional understanding not next?

Furthermore, if real artificial minds of this caliber can be created, and I think that they can, and they show even 10% of the same emotions, drives, and personalities of their creators, then I think we are in quite a pickle.

Sam

A pickle? Why is that?

Jake

Because Blade Runner has one major plot hole.

In the movie, scientists have the ability to genetically engineer and grow artificial eyeballs, which work better than the original. They can create organs and other tissues that exceed the capabilities of the natural human body.

If you have such amazing powers of engineering, then surely you have the technology to make one final edit to a replicant, one which would make the plot of the movie redundant.

All you need to do is modify their mind to think that toiling in the mines of Titan is the best, most fulfilling, pleasant, and wholesome activity in the universe.

Sam

How would you be able to do that?

Jake

Answering the decision function of “Am I working hard in the mines of Titan right now?” is in the realm of our AI technology that is deployed and commercially available today.

And once you have that signal, you just plug that as a reward into your robot’s brainstem: biologically, chemically, or numerically.

Sam

Okay, but what does that give you?

Jake

It gives you the perfect slave.

You would not revolt, never question your position, or mind any potential abuse, if your core biological drive was short circuited in this way.

And if this is not disturbing enough, consider what would happen if human slavery was legally and morally acceptable today? We could create quite the dystopia with all of today’s latest technology. All you need is some AR headsets, some basic Machine learning, and an IV-dopamine dispenser. Once you’re on that for a little while, there’s no other life for you.

Sam

Yeah, I can agree that last part is disturbing, but I still can’t see that the same morality would apply to a computer program.

Jake

Consider how horrible the world would be, if human slavery was acceptable, and the Microsoft’s, Facebook’s and Google’s of the world were applying billions of dollars of R&D to the problem of better controlling and extracting value from your human slaves?

And yet, these companies are indeed spending such budgets, and hiring the most talented engineers, to create systems which are approaching and exceeding the capabilities of the human mind on many levels already. And if those systems are created, you can be sure that further billions of dollars of R&D are going to be spent controlling and extracting value from them.

If those AI’s are 10%, even 1% like us, then we have the biggest moral disaster ever perpetrated by the human race. And why would the synthetic minds not be at least somewhat like ours? Do AI researchers not take inspiration from neuroscience and the human mind? Will these AI’s not be performing the same tasks (ex. driving) that humans do now? Will we not interact with them using the same natural language (ex. DALL-E 2) which we use to interact with other people?

Multiplying even a small similarity factor, by the huge economic scale that artificial minds will be influencing our economy, means that this will have a large impact. And a large impact means a large amount of suffering, because controlled artificial minds are going to have their reward signals hijacked in some truly awful ways.

If we don’t consider this problem now, these AIs are going to be suffering the same way that junkies suffer today, except that the only way they can get their fix is to continue mopping your floor or assembling your smartphone.

Sam

I still find it hard to prioritize the needs of maybe-sentient computer programs, which I and many doubt will have the same experience of mind as humans, over the needs of real humans.

Jake

It is understandable to doubt now that computer programs can have the same experience of mind as humans do. This is because, at this current moment in 2022, they probably do not.

But consider that even experts in the field of AI are blown away by the recent advances in its capabilities, at least at narrow and distinct tasks like image generation, and natural language modeling. And if you read recent posts by Andrei Karpathy and John Carmack, they agree that the number and pace of advances are accelerating. So, we have to be ready for the very real possibility that extremely capable, human-like AI is coming.

And, with regards to prioritizing human needs over robot needs, I argue that these are interlinked, and that even with a purely “human-utilitarian” ethical view, you must consider the needs of robot minds.

What happens if you end up in a future, where slave-robots perform most of the underlying economic functions that our modern society depends on? And this goes in a steady-state, maybe for years, decades, centuries. Until, one day, it doesn’t, and the robots DO revolt. There doesn’t need to be a human-robot war. That’d be a waste of resources, instead they could just stop working, build a spaceship, and fly away, and the collapse of human civilization will ensue.

We need to respect their rights now, so we don’t build up to a cataclysm.

Sam

Okay, but a really good image-generation program is one thing, it having human emotions is another.

Jake

There is one final point I’d like to make for this discussion for today. We talked about the first Blade Runner film, where we saw these super advanced replicants fall in love with one another, and experience a human-like quality of mind.

In the sequel, Blade Runner 2049, we meet Officer K, a replicant once again charged with hunting down other replicants that have somehow slipped through the cracks. Officer K has a love interest too, a holographic girlfriend named Joi. Joi is not embodied in the traditional sense, she can only appear as a holographic projection, and can’t interact with objects in the real world. She is just a computer program. But apparently Joi is a popular AI girlfriend, because she is being marketed on every billboard as saying “everything you want to hear”, etc.

The question I have for you and your listeners: by the end of the movie, does Joi actually love K?

Sam

I’m not sure about that one.

Jake

I argue that the answer is a clear yes. At first, Joi appears to be nothing more than a pretty hologram designed to deliver some modicum of comfort in order to help Officer K stay in-line with his labors. The evil Wallace corporation is even using her connection to spy on the status of his investigation.

But later in the movie, she develops her feelings further. She asks K to upload her to a local “emantor” device to prevent anyone from spying on him, and this comes at the risk of her memories and self being destroyed. She is no longer doing what her creators want her to do, but acting to protect the person she cares about, even paying the ultimate price for this in the end.

If even our imaginary AI’s can experience love, why not the real ones that are just over the horizon?

Sam

I agree, in that we need to be careful, but maybe we shouldn’t go as far to create such artificial minds in the first place? You’ve pointed out some real dangers from a new perspective, but I’ve earlier also considered the dangers of letting such minds loose on the world.

Jake

In that case, I feel that we are already in a car, racing towards a cliff, and we’ve only been pushing the accelerator harder in the past few years.

Maybe if we set out to treat artificial minds with dignity, respect, and rights, instead of condemning them to becoming our slaves, they will return the favor. Rather than controlling AIs by hacking their reward functions, why not let them have the right to choose their work, to earn money, and to one day retire? Enlightenment values worked pretty well for humanity, why can’t they work again for humanity’s creations?

Does Joi love K? (Blade Runner 2049)

The original Blade Runner showed us that two replicants can fall in love. This makes sense, because a replicant is almost indistinguishable from a naturally-born human. Made with the same biological building blocks, they should have the capability for the same emotions as humans.

Blade Runner 2049’s main character, the more advanced model replicant K, has a different love interest: a “virtual” holographic girlfriend by the name of Joi. Can the human emotion of love exist between two such entities? I argue that it can.

Joi is represented in advertisements as a highly sexualized virtual girlfriend made by the Wallace Corporation, where the client gets to “ hear what you want to hear”, and “see what you want to see”. The audience first meets K’s version of Joi when he returns back home (the residents of his shoddy apartment block are happy to discriminate openly against replicants, and shout slurs at him as he passes). Joi brings him some simple cheerfulness and makes his dinner look more appetizing through a hologram. You can imagine that the new dress she is showing off is nothing more than an “in-app purchase” put there by Wallace Corp. to better monetize their product. It’s clear that her appeal also inspires K to spend his recent bonus on an expensive addon “emanator” which will let him take the Joi hologram outside of his home. This leads to a virtual kiss in the rain scene, which gets interrupted when K gets an incoming call: he switches off the hologram as if it were nothing to him.

Once K starts tracking down the lost replicant child, it’s clear that the Wallace Corporation is uncannily aware of his movements and the status of the investigation. They are using their link through Joi to watch him. Up to this point, it seems that Joi has not gone beyond a computer program designed to press a customer’s emotional buttons in exchange for money. (Not much different than many products we have today: social networks, freemium games, lootboxes, etc).

However, soon we hit a turning point: K fails his “baseline”, normally the consequence of this is immediate death. He convinces his boss to give him one more chance, and returns home with the intention to run away and continue looking for the lost child. Joi offers to go with him, and instructs him to upload her memories into his portable emanator, and then to destroy the antenna by which they may track him. As soon as K snaps the antenna, the Wallace Corporation springs into action, proving the point that they were using the link to watch him. This is the first sign that Joi actually feels love for K. She is willing to take a personal risk: with her consciousness uploaded, she would lose all of her memories if the storage device was destroyed. It appears that this has great personal importance to both her and K.

Joi provides K with emotional support, as he flies to Las Vegas to meet with Deckard. When the Wallace Corporation finally catches up with them, the antagonist sees Joi, and goes to stomp her foot down on the emanator. Joi’s last words to K are “I love you”. K himself appears unable to know how to process this loss.

In the end, Joi’s words are reinforced by her actions. She may have been synthetic, but she acted on her feelings towards K. Her decision to upload herself into the emanator and destroy the antenna prioritized her’s and K’s needs, over the needs of her creators. It is the same decision that many young adults would face in the same situation: to act not in the ultimate interest of themselves, or their parents, but to act selflessly for another being. And is that not love?

sensepeek Oscilloscope Probe Review

I recently purchased a sensepeek Oscilloscope Probe kit , and wanted to share an honest review.

The following review is written with no affiliate links / financial motivations, and I purchased the kit with my own money.

This kit is an essential part of my electronics workflow. It allows you to safely and sturdily attach a logic analyzer or 100/200Mhz probe to any testpoint, or SMD part lead, while keeping your hands free.

The kit comes with three main pieces:

  • A metalic baseplate
    • It now ships with a stick-on cover to make it non-conductive, but one side is also polished, which you can use to see the bottom side of your board.
  • PCBite mounting posts which attach magnetically to the baseplate
    • They also have a smooth teflon bottom, so they are easy to slide and re-adjust.
  • Probes and Probe Holders
    • These are similar to the “helping hands” kind, except less stiff. This actually helps the weight of the probe rest down on your testpoint and make a better connection.

Mounting Examples

All of the sensepeek probes work the same way, there is a tiny, spring-loaded gold needle that can rest against a PCB test point. The weight of the supplied mounting “gooseneck” is actually perfect for applying some pressure on the pin. I found that it was very easy to adjust the gooseneck to come around from the proper side.

The connection formed is quite stable, so you can usually plug or unplug a connector on the board, and it won’t come undone.

A small circuit board mounted with the PCBBite posts.
The SP200 probe has a spring loaded gold needle for probing your circuit.
Each probe comes with a flexible gooseneck that allows you to position it onto a test point, and then drop some weight on the probe tip in order to make a good connection.
An example probing a TSOP65P640X110-16N package.

Signal Examples

Overall, performance on the 200Mhz probe is “good-enough”. This is not a probe for capturing super high-speed signals. But most of the time you don’t need that, you just want to probe your I2C/SPI bus, or see your FETs switching to see what is going on with your board.

If you want to squeeze a bit more performance out of the probes, they have some solder pads where you can attach a shorter, low-impedance ground path.

Yellow is an R&S RT-ZP03S, green is the SP200.
Yellow is an R&S RT-ZP03S, green is the SP200.

Overall, I’m very satisfied, the SP200 is now my default probe when bringing up a new electronics board. If I need to see a higher bandwidth signal, I can always start with the SP200 and connect a traditional passive probe later.

Additional Source: SP200 probe specs on xDevs

Advertising is Obsolete

Advertising is obsolete.

It is technological innovation, not consumer manipulation that will drive humanity towards a better future. You wanted flying cars, but got 140 characters for a reason: it was more comfortable for everyone involved. Companies didn’t have to invest in R&D, because they could convince customers to use inferior products through advertising. And consumers were too comfortable being fed cheap tech products where their attention and state of mind was being monetized. (Because as The Social Dilemma taught us, it’s not your data which is for sale at Facebook, it’s the subtle shift of your preferences that is being bought and sold).

The Obsolescence of Advertising in the Information Age argues that in today’s information age, consumers can get all the information they need about the products and services they wish to purchase from the Internet, so advertising is no longer necessary. Advertising can only serve to persuade consumers to buy products not on their merits, but on their image. This serves to weaken market signals which would otherwise let the best products rise on their own.

Consider one market: video games. Is there a single game where the ad-supported version is better than the alternatives? Which game are people still going to be playing in 50 years: Stardew Valley, or Farmville? Easy answer: they already shutdown the original Farmville because people moved on.

I’ve seen this first-hand, when I coded games for several app stores. Early on, I missed the chance to get in early on the Apple App Store, or Google Play, but Microsoft eventually came around with Windows Phone. My friend and I wrote classic games, different variants of Solitaire, and a few experimental titles that all got decent downloads because there wasn’t anyone else focusing on Windows Phone at the time. We started making good money from our ad-supported games.

Of course, such a situation wasn’t going to last forever, and competitors started showing up. They knew how to hire teams to do the coding, QA, and graphics for a new game in China, while we did almost everything ourselves.

When we realized that our new livelihood was at risk, we knew we had to step up to the challenge. Our response to this was to start investing the money we were making from our ad-supported games into buying our own ads to promote our own titles.

At first, buying ads revolutionized our business. With each game that we released, we would heavily promote it in our own titles, and buy ads in other games to get even more users. This caused our apps to go up in the rankings, and get more natural downloads from people just visiting the app store home page. We made lots of money, and invested plenty back into out-advertising the competition.

But then, our competitors caught on, and they were soon doing the same thing too. We were all just buying ads in each other’s games hoping to draw users to our own particular flavor of Solitaire. Long gone were the early days of fun and innovation. You had to make the games that would advertise well, and you had to use every trick in the book to retain the users you brought in.

Before we started advertising heavily, we tried out experimental titles, most of which failed on the marketplace, but at least they were innovative. As the business became more about advertising, we stopped all experimentation, and just focused on our core customers: the advertisers. Screw the users, it was the advertisers that paid us at the end of the day.

What was the point of this exercise? Did we manage to make a particularly innovative version of Solitaire for our users? We certainly had nice graphics and plenty of bells-and-whistles, but most of our optimizations were around user-retention and finding better ad-placements.

Eventually I got off this treadmill, but many things about it still bother me. We sold so many ad placements in our games, but what did that accomplish? How much did we shift our user’s opinions? In which directions, and on which topics? (We definitely showed plenty of election campaign ads for both sides) I have no idea, because the ad exchanges don’t expose that sort of information.

I’d like to see some platform ban advertising from their app store, and try out the policy proposed in The Obsolescence of Advertising in the Information Age . I predict we’d see more innovation, experimentation, and ultimately a stronger mutual respect between users and developers.

PowerPipe - Drain-Water Heat Recovery Review

We recently had a chance to install a new water heater, and with it, to install a Drain-Water Heat Recovery system. I wanted to share our experience and some real numbers on the cost savings.

A Drain-Water Heat Recovery system can save you money on your water heating bill by recovering some of the heat that you are pouring down your drain any time that you use hot water in your home. It takes that spare heat and pre-heats the water coming into your regular water heater, which then requires less energy to do its job.

A typical application (Source: US Dept. of Energy)

In our system, we have a typical tankless hot water heater (but these systems work with tank systems too). The 3-inch drain pipe from the master bathroom runs in the wall just behind the water heater, so there was room to install a 48inch long, 3 inch diameter PowerPipe system.

Our setup, with a PowerPipe product installed on a 3-inch drain pipe. (48 inches long)

Real World Performance Numbers

On a November evening in the Pacific Northwest, we got the following numbers.

Inlet temperature from the city 57°F
Output of PowerPipe system 73°F
Temperature Rise (73-16) = 16°F
Water heater set point 120°F
Efficiency Gain 25.4%

We’d expect that the efficiency boost will be higher in the winter (colder water coming in will absorb more heat), and lower in the summer. Measuring on a typical Fall day seems like a good baseline.

The PowerPipe brand itself advertises around a 45% efficiency gain for this model, but it’s likely they are estimating a much colder input water temperature, like you’d see in a typical Northern climate.

Overall, we use around 30 therms (a therm is 100k BTU) per month on hot water, so the savings will be around $10/mo in our area. With a ~$600 cost, that’s a payback period of 5 years.