First, a few notes about why, generally, there is an opportunity in this space right now (or soon).
- Digital media consumption has skewed strongly towards our eyes. One reason behind this is that the technology for information distribution via our eyes has been ahead of our other senses (and has been for a very long time).
- As computers get stronger and smarter, we’re opening up new opportunities for inputs and outputs with computers, this will create unique opportunities for new companies / services.
- Audio is undergoing the most significant transformation lead by a few factors: ubiquitous microphones and speakers (AirPods + many others), NLP starting to work, handheld computing power.
- Lots of public experimentation in this space, not a lot is really sticking. Still looking for the killer apps.
As a hobby for the past few years, I’ve been immersing myself in this space trying to understand it better at a personal level and what technology is available. The follow list are themes of utility that I feel are emerging and of interest:
- Music / Song Capture
- Audio Journalling
- Short Term Memory
- Long Form Audio Messages
- Meeting Notes
Music / Song capture
I’m a musician. Often the act of writing a song involves me needing to use my hands and capture an audio moment that’s evolving organically. You never really know when the “good stuff” will happen, but you need to be prepared for it. I started using Voice Memos to capture this process to try to extract the moment in which the song comes to life (as an aside, this is very important to capture, usually the song itself is trying to re-create the energy of this moment).
- Pressing record, takes you away from what you are doing. It can completely kill the moment. I created a Siri shortcut “Siri, Record this” to record more easily, but it was hard to make this seamless enough (for instance if my phone is locked it would make me have to hover my face over it to unlock).
- I would therefore be more likely to record the entire session (which could last hours and be with multiple people). This was good because the technology wouldn’t get in the way, but I would be left with a significant amount of “editing” if I wanted to extract any value from the session.
- Audio quality is as good as your microphone setup.
- Ability to annotate timeline via realtime input — after doing this for a while, I wanted to be able to flag or annotate the audio timeline via the audio I was capturing. This could be in the form of notes for myself in the audio process, but could also be used to edit without a computer. For instance “delete the last few minutes of silence,” could be used after leaving the room for a few minutes and realizing I left the recording on, or “split audio file here” could split the file in 2.
- Ability to combine multiple microphones in a room for better quality audio — AirPods, multiple iPhones, and iPad would be present, but I could only record using 1. It would be really great if I was able to record all at the same time then stitch the files back together to increase the fidelity.
- Smart editing — Voice memos should follow googles lead and provide transcription and tools to easily cut silence or minimize noise.
I like to talk but I type all day. I know journalling is important but I’ve always had a hard time incorporating it into a daily practice. That is until I started audio journalling. The act of talking to myself as a form of self reflection feels really natural. It takes a moment, but once I got over the initial hurdle of the stigma of talking to myself, I’ve found this to be a great way to explore thoughts and feelings.
Again, similar to music capture, management of audio is labour intensive, so while I am technically capturing these journal entries, they are challenging to extract any value out of, unless I am as diligent in editing as I am journalling. Also, just like music, there may be an epiphany or a discrete snippet of value, but I had no way to easily isolate that value while recording.
And, when it comes to editing, written text is much more mutable than audio (with our current toolset). Descript is trying to change that, but the workflow still feels limiting.
- Ability to execute ‘programs’ while journalling — while the music capture use case was often set it and forget it (my focus would be mostly on the music), while journalling I’m much more aware of the tool. This would enable me to not only be able to annotate the timeline, but be able to execute programs while speaking. For instance, I’d like to be able to add something to a task list while journalling. Or, I’d like to capture an idea in written form in my notes. Or, perhaps I’d want to capture a few links to google searches I’d like to perform at a future date. Or, I’d like to track (rate) my mood, sleep, and / or how my body feels in the beginning or middle of a journal. All of this could be done if the audio capture service could recognize keywords.
- Text based editing like Descript is a great feature, but it should be the default editing mode of an audio capture tool rather than an entire workflow.
- Would be interesting to integrate this type of journalling feature into a health based app. Something like Headspace.
Short term memory
As I have been immersing myself in the design space of audio, I have become very sensitive to “pressing the red button.” There are two affects to describe: 1) if you are in a scenario and you wish you were recording what was happening (conversation, music, personal train of thought), the act of initializing recording disrupts the scenario, often taking away from whatever flow was happening. This phenomenon is not exclusive to audio as any time we reach for our phones to do something we are taking away from the thing that instigated that action. 2) knowledge of being recorded often changes your actions. It takes some time to build comfort with the idea that every sound you are making is being archived. This, too, can get in the way of flow—there is some part of your mental energy being spent on self awareness that could be spent on the mental task at hand.
So, I asked myself the question, what if we could remove the red button? What if capture was a retroactive process? If we were capturing everything, we could just go back to the moment we wanted to archive and do something to “commit it to long term memory.” I started recording more and longer sessions. In meetings, or conversations with my staff, or friends I would start recording at the beginning and leave it run.
- We would need to give visibility to people that are being recorded to allow them to opt out. We are all sensitive to what happens with our personal information, and the voice feels very personal. This pre-cursor to the conversation would instigate affect #2 from above (visibility of recording) and sometimes changed the conversation for its duration.
- Recording often and in long sessions has the same challenges as observed in the song capture and journalling use case (signal to noise)
- Recording everything creates a lot of excess data.
- This idea is more or less being done by today’s smart speakers. They are listening to everything and using keyword recognition to execute commands, but they aren’t giving us the ability to use the audio they are capturing. Taking this type of technology and re-positioning it for a different use case could lead to more interesting outcomes.
- Of course, this would be more valuable if it was on your mobile rather than needing a device in your home. Then there will be platform limitations.
- Short term memory app
- OS infrastucture
Long form audio messages
Send long form messages to others (or yourself).
- As these types of memos are all one sided, the recipient doesn’t have a chance to interject and ask questions.
- Time capsule — Send messages to an address in the future.
Recording meetings to add accessible functionality to the meeting.
- Disambiguation and identification of different voices.
- Audio interface getting in the way of the conversation
- Ability to annotate slides in a design review with dictated audio from discussion.
Various overarching technical considerations
- Audio Monitoring
- Audio Ducking
- Multi-microphone recording
- Noise reduction
Various audio functions
“Hey Alfred, save and transcribe the last 30 mins of audio.”
“Do you like me to tag it with anything?”
“Take the tags from this sentence: Conversation with Kristina about our secondary dwelling.”
“Hey Alfred, let me listen to audio from last Thursday.”
“I’m sorry, it doesn’t seem that you’ve saved anything from last Thursday. Please keep in mind, my memory only lasts 24 hours.”
“Tag—idea: clad the inside of the studio space in plywood. This will enable modularity in what we do inside. End tag.”