Gaussian blur was the first Photoshop filter I tried applying to sound. I was not sure what to expect. I imagined the signal might be distorted or over-driven in some way. A Gaussian blur is created by applying a Gaussian function, which is also used in statistics for normal distribution. Instead of hearing distortion, the audio, to my ears, actually sounds blurred. The effect is like a chorus, but the envelope is less clear. The attack of each note seems slower or spread out while the frequency is blended or even slightly warbled. All in all a pleasantly surprising experiment.
Gaussian Blurred Electric Piano Pattern
I’m really enjoying this whole Photoshop as an effect processor. I’d like try it out myself, but I don’t own the programs and I don’t pirate software (not implying that you do, just saying that I don’t).
One thing I thought about though, was using photoshop layers and masking. For example, take a sound add lots of different vst plugin effects. Print each effect to a new audio file and process each as a layer in Photoshop. Then use the visual effects as well as masking between the layers.
Just a thought…. Great blog
This is a wonderful series.
I never wondered what Gaussian Blur sounded like – but I’m glad to find out.
I haven’t messed around with Photosounder yet, but I know a little about FFTs and IFFTs. If you do any processing at all in the frequency domain, you’re going to have blurred transients. To test this, run a drum loop through Photosounder — convert to an image and then back again, and compare with the original.
Part of the problem is the photo resolution, while it might be huge in terms of your computer screen, is not ‘wide’ enough to accurately represent sounds. FFTs will do a near-perfect job of identifying the harmonic content of a sound, but if you e.g. have a 2 second sound converted to a 2000-pixel-wide image, you will get out 2000 one-millesecond slices, each containing the spectral content of a millesecond of the original sound.
Given that the transients that represent percussive note attacks are sometimes only a fraction of a millesecond long, some smearing is inevitable, even if you don’t monkey with the image in Photoshop.
But yeah, this is great for getting all sorts of crazy vague blurry effects. There’s a bunch of spectral processing programs out there that can do creative stuff with sound; this is just one particular application.
Interesting. There’s really two things the Gaussian blur does, that I believe are two separate effects. Firstly, it blurs horizontally, which can be a pretty weird effect if you push it quite a bit (understand until you have the equivalent of less than 10 pixels per second) it sounds like the sound is slowed down, but if you have an idea what the original sound is you can tell it’s going at its normal rate. I believe that effect works good on speech.
Secondly there’s vertical blurring. I haven’t tested that in a while, but if I recall correctly if you push the blurring quite a bit then a “regular” instrument such as this electric piano should sound a bit more like a flute or like strings. Not quite sure..
chaircrusher : You’re right about transient sounds being the first casualties in that sort of processing, but it’s not quite as simple as “pitch is preserved, transients are lost”. It’s actually not really like FFTs, I mean sure, Photosounder uses FFTs, but only for the sake of speed, it doesn’t have to, and doesn’t use the concept directly. You’re right in that FFTs will do a great job at identifying the harmonic content of a sound, but the thing is, FFTs are run on entire chunks of sound, which is not exactly the case here.
Basically, while you can get all the frequency resolution you want with a FFT on the whole chunk of sound, you’ll have no time resolution at all, i.e. you can’t tell when something happens in time. Therefore, in such programs as Photosounder, you have to balance frequency resolution with time resolution, and find a satisfying compromise. Photosounder 1.1 actually only has a frequency resolution (or should I say pitch resolution) of 24 pixels per octave, and the time resolution varies depending on the frequency, which is why bass sounds are usually poorly retranscribed, but overall it’s the best one size fits all compromise I could find.
You seem to think that 1,000 pixels per second are not enough, this is what I originally assumed as well, but much to my surprise, this is actually completely overkill and useless, and believe it or not, only 100 pixels per second are actually plenty, 300 being about as much as you might decently require, and 30 being as low as you can go to keep something like intelligible speech. Which is quite fascinating when you think about it, you could embed audio in a video by putting one line in the image, and you could hear an intelligible sound out of it.
So the good news are, our ears/brain aren’t so demanding in time resolution to the point that only 100 pixels/second ought to be enough for anyone, but the bad news are there are technical limitations that make it hard to even reach anywhere near that figure for lower frequency sounds while keeping a decent pitch resolution.
Martin : That’s a very valid though, there’s lots of classical audio effects that can be reproduced visually by masking in Photoshop. I intend to document how such effects can be reproduced graphically.
Michel> I think I’ll have to get Photosounder soon — this month I paid for a bunch of software and hardware already but I promise not to bootleg it ;-)
So to clarify — if you convert a sound to a picture, then convert the picture back to the sound, at 100 pixels/second, how close will the result sound to the original? Will EG a violin convert better than a drum loop?
I assumed what you were doing was using an FFT to convert windows of 512 or 1024 samples and filling a column of the image with the result. Then, to convert back to sound, you use ifft. Is this _not_ what you’re doing, and how much are you willing to disclose about your algorithm? Not that I would want to implement this myself — i work full time on medical image processing software, and have banished all compilers & such from my home computers.
chaircrusher : It’s hard to tell you how close the resulting sound will sound to the original, it goes from pretty faithful (such things as violins are usually well reproduced, although the compromise of paramters used in Photosounder isn’t optimal for that type of sound, as of now) to kind of damaged (in the case of drums for example). However hopefully in the upcoming version 1.2 things will be improved.
The analysis algorithm is actually open sourced, you can find the source (function anal() in dsp.c) on http://arss.sf.net as well as an explanation of what it does. You can also find there an earlier version of the synthesis algorithm, however it’s about 200 times slower than the current one (before I created Photosounder I would let a batch script run during the night so that in the morning I’d have a few sounds to listen to, because an image could easily take 20 minutes to synthesise. Now you can start listening almost instantly, you kids these days have it too easy ;-).)