Playing with Fourier transforms of images
Yesterday I got inspired to start playing around with Fourier transforms of images, and I'd like to share some of the results. Most are intended to just be artistic, although playing around has also given me a little more insight into how the frequency domain relates to to the spatial domain. There's also a git repo so that you can reproduce these images and video yourself, and for many of the images I'll link to the version of the code that produced it.
In many of these, I've transformed a grayscale image to the frequency domain, messed around with the amplitude or phase information, and then transformed it back into spatial. In others, I've just plotted the amplitude or phase, and then sometimes post-processed the plots in GIMP.
I'll start off with one of my favorites so far; many more explorations under the cut.
Warning: This is sort of a stream-of-consciousness post. Feel free to just look at the pretty pictures and skim the text.
I really like the textile-like appearance that you sometimes get when messing about with the phase information! I suspect it has something to do with essentially randomizing the phases of the dominant frequencies. Looking at the source image (a sample in the scikit library) I bet that the "fabric" appearance comes partly from relatively high frequencies in the grass, and that the "wrinkles" come from the legs of the tripod. The rectilinear lines might come from the buildings in the background.
This was made with code version 2da3e373 although the generation script below differs from the one in the commit message
- I reduced the iteration parameter sequence from 0..100 to 0..99 so that I would get a smooth loop. (This was an off-by-one error.)
- Remove framerate and compression args from ffmpeg, and hide the startup banner
- Most importantly: Specify that the RGB information in the PNGs should be converted to YUV color space, because apparently that was a major compatibility issue.
for i in {0..99}; do
c=$(bc -l <<<"8*a(1) * $i / 100")
python -m fimg $c
mv out.png out/`printf %02d $i`_phase_set_$c.png
done
ffmpeg -hide_banner -f image2 -pattern_type glob -i 'out/*' -vcodec libx264 -pix_fmt yuv420p video.mp4
(Input and output paths were still hardcoded, and phase angle was accepted as radians instead of a fraction in [0, 1]. Later versions of the code differ.)
I had originally set out to make animations of alterations to the frequency domain, but I quickly found myself fascinated by just single images.
Messing around in the frequency domain quickly corrupts images, I suspect through two mechanisms: 1) Phases getting out of alignment, and 2) amplitude sums exceeding the [0, 255] intensity bounds of the image format, resulting in clipping. I may at some point check on that clipping hypothesis; if I do some rescaling before saving as an image, maybe I can avoid that issue.
I've also played with "rolling" the amplitude and phase arrays along one or both axes (independently or together), "blurring" the arrays (adding neighboring pixels), and swapping them. Some of the effects are interesting, others aren't.
The typical thing I'm aware of people doing in the frequency domain is to masking out (setting to zero) regions with higher frequencies, creating a blurring effect—or lower frequencies, creating an edge-highlighting effect. I haven't added code to do masks, but I can recommend this demo I found written in Javascript.
I originally kicked off on this project after watching Ben Krasnow's Intro to Fourier Optics and the 4F correlator. It turns out that light passing through a black and white transparency is naturally diffracted in a way that produces frequency-domain information, and all you need is a simple lens to focus that infomation onto a plane. Except it turns out to require fairly precise equipment and works best if the transparency has micron-level details, near the wavelength of visible light. Ben has some pretty advanced equipment and know-how, but still struggled to get good images. On the Huygens Optics channel, Jeroen managed to have a bit more luck and was able to produce some fairly clean transforms. Fourier transforms at the speed of light, imagine that! Pretty wild stuff.
OK, more pictures. Here's the result of throwing away frequency
amplitude information, and just preserving phase. Specifically, I ran
with const_amp --value 10000
to set the
amplitudes all to an arbitrary 10k. This gives an image made mostly of
graininess, I guess because high frequencies such as sensor noise are
normally low-amplitude, and have been boosted here.
I also tried "speckling" the amplitude, scaling every frequency's amplitude by a random number from 0 to 1. This gives an interestingly textured effect to the playground equipment photo, somewhere between the source image and the constant-phase transform.
(All of these can be reproduced with the given arguments in version
098feaac
of the script.)
I plotted the phase information and it was mostly a speckly streaky
mess, even harder to interpret than the amplitude plot. However, it
does have some structure if you look at it in full 1:1 pixel view, so
I decided to throw the phase plot back in as a source image and
extract an amplitude plot of that. (The amplitude plot has been run
through log2
and remapped to a [0, 255] range. Normally the
amplitudes for these photos run into the millions. The phase plot has
just been remapped from angles to [0, 255] in the way you'd expect.)
And... it appears to have, to some degree, reconstructed the source image! (It's all quartered-up because the plot outputs are rolled by 1/2 along both axes, as is traditional.)
Here's a bowl of silica gel beads and a heavily post-processed amplitude plot. The plot has been cropped and had the levels globally adjusted to highlight a hexagon around the origin. I think the hexagon represents the spherical close-packing of the beads, even though that packing is fairly imperfect.
Here's a weird photo from NYC of a giant crowd of fire extinguishers behind a hotel.
There are a ton of geometric patterns in this image, so it's great for FFT. (Not just the extinguishers—we got bricks, fence, pipes, etc. It's also different in X and Y directions.)
Previously I've only used grayscale images, but given an RGB image the code will run the requested transform on each color channel separately and then recompose. The output isn't very colorful is it, though? Well, no reason it should be! Most of the FFT output is high-frequency stuff like sensor noise. Those big color blocks in the input are probably represented in a relatively small number of low-frequency pixels in the output.
(Note: This FFT is on a scaled-down version of the image.)
If I just blast the bejeezus out of the saturation and contrast on that last FFT, the first image shows the result. Note the concentric rings of red and cyan around the origin—that's probably the fire extinguishers! The orangey cast farther out, I'm not sure... maybe it's due to the bricks and the weird red haze over the upper left of the input image.
For comparison, the second image is when I took the FFT of the full-resolution image, then downscaled that. Notice how it appears to be zoomed out? That's because there's a broader range of frequencies to represent in a larger image. (It was also about 7x larger in file size, ~15 MB, which is why I downscaled it after everything else—just to optimize.) Now there's a greenish cast to the high frequency areas, and I don't know why! Maybe something about green-pixel resolution being higher in cameras, but that doesn't seem right either. Let me know if you think you know the answer!
Additions 2022-04-29
I've confirmed that the harsh inversion bands and patches in these images are due to clipping. The amplitudes of some of the frequencies are quite large, sometimes as high as 3e6. In the inverse transform, destructive interference results in all of the waves ading up to [0, 255] in all locations—because they were constructed to do so, of course. But if they are pushed out of phase, that destructive interference no longer results in the large waves being nearly cancelled out, and so the image data ends up with intensities that are far out of range.
As of version f9210c51
there is now a global --out-of-range
option
with three choices:
mod
takes modulo 255 of the intensity, resulting in banding and bright/dark patches. This was the default behavior of the image library I'm using, but now it's made explicit.clip
clips the too-low or too-high values to 0 or 255. This can mean the entire image simply goes completely black or white.percentile-pull-clip
is more complicated. If the 10th or 90th percentile brightness is out of range, the image is linearly rescaled. (The threshold can be controlled with--clip-percentile
.) If the opposing high or low percentile is within range, then 255 or 0 is used as the other endpoint of the linear scaling, so as not to warp the intensities of the image unnecessarily.
Here are some sample graphs that might help with intuition:
And here's an animation of rotating the phase angle of the tower image, with each method displayed side by side:
Code for making the animation:
for i in {0..99}; do
c=$(bc -l <<<"scale=3; $i / 100")
for oor in mod clip percentile-pull-clip; do
python -m fimg --out-of-range="$oor" \
~/tmp/tower-400x300.jpg "out/$(printf %03d $i)_phase_rotate_angle_${oor}_${c}.png" \
phase_rotate_angle --circle-fraction "$c"
done
done
for i in {000..099}; do
convert \
"out/${i}"_phase_rotate_angle_mod_* \
"out/${i}"_phase_rotate_angle_clip_* \
"out/${i}"_phase_rotate_angle_percentile-pull-clip_* \
+append "out/${i}_horiz.png"
done
ffmpeg -hide_banner -f image2 -pattern_type glob -i 'out/*_horiz.png' \
-vcodec libx264 -pix_fmt yuv420p tower__phase_rotate_angle__oor-multi_v.f9210c51.mp4
I'm still not satisfied with the options. I want a remapping function that will leave an image mostly alone if just a few pixels are way out of range, and that won't rescale an image to use the full intensity range if it didn't originally occupy it. (If an image is almost entirely in [50, 200] with just 1% of pixels in [200, 4000], I want the output to be in [50, 255]—not to have the low end stretched down to zero.) But if the whole thing is way out of range, like [-4000, -3000], I want it scaled and shifted to [0, 255], maximizing the contrast.
But most importantly, if two inputs only have slightly different intensity distributions, the outputs should only differ slightly! This seems obvious, but it's relevant when one of the images is entirely in-range and the other has some out-of-range pixels. If there's a dramatic jump between the two, my animations will look ugly. :-P
So I need something non-linear, and I need it to be tunable based on image statistics—mean, median, percentiles, etc. It's possible I need something that can handle both symmetric and skewed distributions. And it needs to be tunable in a way that allows smooth animations.
The best I've got so far is percentile-pull-clip
. It usually meets
these criteria, but theoretically if I had an image with more than 10%
out-of-range pixels, and if they were orders of magnitude out of
range, the rest of the image could be flattened during the
rescale. (Setting --clip-percentile=0
so that a rescale always
happens for out-of-range pixels quickly illustrates this.) So I'll
keep looking.
Anyway, now I can revisit some of the images that were "corrupted" by
out-of-range values. These use --clip-percentile 3
:
And I can redo the original animation to avoid the clipping. Much more pleasant, although the clipping did produce an interesting effect itself.
Something is bothering me, though—why does rotating the phase angle cause leftward movement? I've double-checked my phase/amplitude math, but I don't see anything wrong with it. If I rotate the source image 90° first, the resulting animation is also rotated... but there's still a leftward drift, rather than switching to upwards. Might just be something I'm missing about the transform.
Additions 2022-05-02
Ah! I've misunderstood something about 2D FFTs. They are, in fact, run one axis at a time. This means that for certain kinds of messing around in the frequency domain, the effects will be particular to one of the axes. (The first one? The second one? Not sure!)
Really, I just need to go back and re-learn all this stuff. In college we only went over the general theory, looked at FFT of audio data, ignored all the phase information, and didn't cover the fine details of things like "why complex numbers".
I've also noticed that I'm using fft2
and irfft2
, which may not be
a matched pair! So all of the above could be "wrong" mathematically,
which isn't a disaster since this is mostly about art, but it would
be much more satisfying if the art could be based on more accurate
math. :-)
Anyway, that phase rotation (now invoked via phase_shift
instead of
phase_rotate_angle
) looks really cool on color images. Here it is on
a headshot of myself, showing off the difference between clipping
outof-range colors vs rescaling them to a percentile:
There are more vibrant colors with clipping, just before the fade to black, but you lose detail. By rescaling to keep at least 90% percent of the pixels in range, you lose contrast but gain detail. Notice how in the second one you can really see the color shift on the yellow flowers as they turn dark blue. And it's only in the second one that you can see the very cool effect of the colors bleeding leftwards even as the details stay in place.
Generation code
Code version74048755
, ran:
for i in {0..74}; do c=$(echo "$i / 75" | bc -l); python -m fimg --out-of-range percentile-pull-clip --clip-percentile 5 headshot-512.jpg out/`printf %03d $i`.png phase_shift --turns "$c"; done
ffmpeg -hide_banner -f image2 -pattern_type glob -i 'out/*' -vcodec libx264 -pix_fmt yuv420p headshot__phase_shift_oor_ppc_5.mp4
No comments yet.
Self-service commenting is not yet reimplemented after the Wordpress migration, sorry! For now, you can respond by email; please indicate whether you're OK with having your response posted publicly (and if so, under what name).