Playing with Fourier transforms of images

Yesterday I got inspired to start playing around with Fourier transforms of images, and I'd like to share some of the results. Most are intended to just be artistic, although playing around has also given me a little more insight into how the frequency domain relates to to the spatial domain. There's also a git repo so that you can reproduce these images and video yourself, and for many of the images I'll link to the version of the code that produced it.

In many of these, I've transformed a grayscale image to the frequency domain, messed around with the amplitude or phase information, and then transformed it back into spatial. In others, I've just plotted the amplitude or phase, and then sometimes post-processed the plots in GIMP.

I'll start off with one of my favorites so far; many more explorations under the cut.

An animation where I just set all the phase information for a sample photo to a constant angle value, then swept that angle through a full circle over the course of the video.

Warning: This is sort of a stream-of-consciousness post. Feel free to just look at the pretty pictures and skim the text.

I really like the textile-like appearance that you sometimes get when messing about with the phase information! I suspect it has something to do with essentially randomizing the phases of the dominant frequencies. Looking at the source image (a sample in the scikit library) I bet that the "fabric" appearance comes partly from relatively high frequencies in the grass, and that the "wrinkles" come from the legs of the tripod. The rectilinear lines might come from the buildings in the background.

This was made with code version 2da3e373 although the generation script below differs from the one in the commit message

  • I reduced the iteration parameter sequence from 0..100 to 0..99 so that I would get a smooth loop. (This was an off-by-one error.)
  • Remove framerate and compression args from ffmpeg, and hide the startup banner
  • Most importantly: Specify that the RGB information in the PNGs should be converted to YUV color space, because apparently that was a major compatibility issue.
for i in {0..99}; do
  c=$(bc -l <<<"8*a(1) * $i / 100")
  python -m fimg $c
  mv out.png out/`printf %02d $i`_phase_set_$c.png
done
ffmpeg -hide_banner -f image2 -pattern_type glob -i 'out/*' -vcodec libx264 -pix_fmt yuv420p video.mp4

(Input and output paths were still hardcoded, and phase angle was accepted as radians instead of a fraction in [0, 1]. Later versions of the code differ.)

I had originally set out to make animations of alterations to the frequency domain, but I quickly found myself fascinated by just single images.

Grayscale photo of a man taking video using a camcorder on a
tripod, against a background of what might be an athletic field or
university quad. He is facing to our right. The image has some
sideways blurring, but the blur is darker behind the man, to our left,
and brighter to the right. There is some black and white pixelated
fringing on objects, but most strikingly the man's entire back side is
in photo negative (and thus very bright, against his actually-dark
clothing).
Here I largely preserve the phase information, but rotate the phase angle by 1/10th of a circle. [code]

Messing around in the frequency domain quickly corrupts images, I suspect through two mechanisms: 1) Phases getting out of alignment, and 2) amplitude sums exceeding the [0, 255] intensity bounds of the image format, resulting in clipping. I may at some point check on that clipping hypothesis; if I do some rescaling before saving as an image, maybe I can avoid that issue.

I've also played with "rolling" the amplitude and phase arrays along one or both axes (independently or together), "blurring" the arrays (adding neighboring pixels), and swapping them. Some of the effects are interesting, others aren't.

The cameraman image, with bands of color inversion going up and
to the right at a 2-to-1 pitch. The edges of the bands fade to black
or white, then flip over to the next band. The bands are not perfectly
straight, but tend to warp around the edges of objects in the image.
There's some horizontal streaking that might be echoed from the
landscape.
I've shifted the frequency information (both amplitude and phase, together) by 2 along the X axis and 1 along the Y axis.

The typical thing I'm aware of people doing in the frequency domain is to masking out (setting to zero) regions with higher frequencies, creating a blurring effect—or lower frequencies, creating an edge-highlighting effect. I haven't added code to do masks, but I can recommend this demo I found written in Javascript.

I originally kicked off on this project after watching Ben Krasnow's Intro to Fourier Optics and the 4F correlator. It turns out that light passing through a black and white transparency is naturally diffracted in a way that produces frequency-domain information, and all you need is a simple lens to focus that infomation onto a plane. Except it turns out to require fairly precise equipment and works best if the transparency has micron-level details, near the wavelength of visible light. Ben has some pretty advanced equipment and know-how, but still struggled to get good images. On the Huygens Optics channel, Jeroen managed to have a bit more luck and was able to produce some fairly clean transforms. Fourier transforms at the speed of light, imagine that! Pretty wild stuff.

OK, more pictures. Here's the result of throwing away frequency amplitude information, and just preserving phase. Specifically, I ran with const_amp --value 10000 to set the amplitudes all to an arbitrary 10k. This gives an image made mostly of graininess, I guess because high frequencies such as sensor noise are normally low-amplitude, and have been boosted here.

I also tried "speckling" the amplitude, scaling every frequency's amplitude by a random number from 0 to 1. This gives an interestingly textured effect to the playground equipment photo, somewhere between the source image and the constant-phase transform.

(All of these can be reproduced with the given arguments in version 098feaac of the script.)

Photo of some tall playground structure, faced with vertical wooden slats.
There are trees against sky in the background.
Source photo: Tall playground structure
The result of setting uniform amplitudes: An image made entirely of graininess.
Constant amplitude, an image writ in noise: const_amp --value 10000
With constant phase, a cloudy/ripply image, less textile-like
than with the cameraman source image.
Constant phase, for comparison: const_phase --circle-fraction 0
Gray, fine-grained background with fine light lines emanating
from the origin. One long each axis, sprays in the positive and negative X
directions, and another pair of opposing sprays at about 100°/280°. Some white
dots near the origin, a bit like grid points. There's other faint texturing
that's hard to describe—patches of very short light streaks.
Amplitude plot: plot_amp
The playground equipment again. The image is darker and is overlaid
with a texturing, almost cloth-like but fuzzy. The grain of the texturing seems
to be about in alignment with the slats.
Speckled amplitude (nicest output of several): speckle_amp
Like the speckled amplitude example, but lots of inverse-color patches.
Speckled phase (typical output): speckle_phase

I plotted the phase information and it was mostly a speckly streaky mess, even harder to interpret than the amplitude plot. However, it does have some structure if you look at it in full 1:1 pixel view, so I decided to throw the phase plot back in as a source image and extract an amplitude plot of that. (The amplitude plot has been run through log2 and remapped to a [0, 255] range. Normally the amplitudes for these photos run into the millions. The phase plot has just been remapped from angles to [0, 255] in the way you'd expect.)

And... it appears to have, to some degree, reconstructed the source image! (It's all quartered-up because the plot outputs are rolled by 1/2 along both axes, as is traditional.)

The phase plot, with an inset showing a zoomed in 1:1 view.
Mostly just grainy, but there are strong horizontal and vertical streaks
along the outer third of the X and Y axes. If you zoom in all the way,
there's some fine-grained structure, looking a bit like a cellular automaton.
Phase plot with inset showing full resolution: plot_phase
The amplitude plot of the phase plot! I can just barely make out
four quarters of the image, each containing a ghost of 1/4 of the original image.
Sort of negative and edge-detect effects, but very grainy. Half the image, below
a diagonal, has been contrast-enhanced to show off the effect.
An amplitude plot *of* the phase plot, with contrast adjustment on half the image: plot_amp

Here's a bowl of silica gel beads and a heavily post-processed amplitude plot. The plot has been cropped and had the levels globally adjusted to highlight a hexagon around the origin. I think the hexagon represents the spherical close-packing of the beads, even though that packing is fairly imperfect.

A bowl of small glassy beads.
Source image: Silica gel beads, in grayscale
An amplitude plot of the beads with an inset showing
a contrast-enhanced, enlarged image of the center: A grainy white hexagon
around the origin against a black background, a white dot on the origin,
and a white line along the Y axis.
Amplitude plot with contrast-enhanced inset

Here's a weird photo from NYC of a giant crowd of fire extinguishers behind a hotel.

There are a ton of geometric patterns in this image, so it's great for FFT. (Not just the extinguishers—we got bricks, fence, pipes, etc. It's also different in X and Y directions.)

Previously I've only used grayscale images, but given an RGB image the code will run the requested transform on each color channel separately and then recompose. The output isn't very colorful is it, though? Well, no reason it should be! Most of the FFT output is high-frequency stuff like sensor noise. Those big color blocks in the input are probably represented in a relatively small number of low-frequency pixels in the output.

(Note: This FFT is on a scaled-down version of the image.)

Photo from an upper floor of a hotel, showing about 100 large
red fire extinguishers clustered together in a parking lot. There's a brick
wall along the left of the photo, a construction site in the back (separated
from the lot by a chain link fence with vertical plastic strips), and a big
pile of shiny metal pipes that might belong to a sprinkler system.
Source photo: Fire extinguishers in a parking lot
FFT of the photo. Gray grainy background with light rays
coming out from the origin (mostly in the 60° direction and opposing) and
regularly spaced light patches (mostly to the left and right). Extremely
faint haze of varying color.
Per color amplitude plot: plot_amp (of a scaled-down version of the image)

If I just blast the bejeezus out of the saturation and contrast on that last FFT, the first image shows the result. Note the concentric rings of red and cyan around the origin—that's probably the fire extinguishers! The orangey cast farther out, I'm not sure... maybe it's due to the bricks and the weird red haze over the upper left of the input image.

For comparison, the second image is when I took the FFT of the full-resolution image, then downscaled that. Notice how it appears to be zoomed out? That's because there's a broader range of frequencies to represent in a larger image. (It was also about 7x larger in file size, ~15 MB, which is why I downscaled it after everything else—just to optimize.) Now there's a greenish cast to the high frequency areas, and I don't know why! Maybe something about green-pixel resolution being higher in cameras, but that doesn't seem right either. Let me know if you think you know the answer!

The FFT of the fire extinguishers again, but color-enhanced.
Concentric cyan/red in center, surrounded by larger greenish blue annulus,
then orange fading out to edges.
The same plot, but with saturation and color dramatically enhanced
FFT of full-size image, appearing "zoomed out" by
about 2.7x. The orange haze has been replaced by a sickly green.
Amplitude plot of the original, then color-enhanced, and then downscaled

Additions 2022-04-29

I've confirmed that the harsh inversion bands and patches in these images are due to clipping. The amplitudes of some of the frequencies are quite large, sometimes as high as 3e6. In the inverse transform, destructive interference results in all of the waves ading up to [0, 255] in all locations—because they were constructed to do so, of course. But if they are pushed out of phase, that destructive interference no longer results in the large waves being nearly cancelled out, and so the image data ends up with intensities that are far out of range.

As of version f9210c51 there is now a global --out-of-range option with three choices:

  • mod takes modulo 255 of the intensity, resulting in banding and bright/dark patches. This was the default behavior of the image library I'm using, but now it's made explicit.
  • clip clips the too-low or too-high values to 0 or 255. This can mean the entire image simply goes completely black or white.
  • percentile-pull-clip is more complicated. If the 10th or 90th percentile brightness is out of range, the image is linearly rescaled. (The threshold can be controlled with --clip-percentile.) If the opposing high or low percentile is within range, then 255 or 0 is used as the other endpoint of the linear scaling, so as not to warp the intensities of the image unnecessarily.

Here are some sample graphs that might help with intuition:

Diagram of original problem and the results of three clipping mods
The x-axis on each example represents a single spatial dimension, perhaps a row of an image. The y-axis represents the intensity of the image (in a single channel).

And here's an animation of rotating the phase angle of the tower image, with each method displayed side by side:

Three methods of handling out-of-range brightness information: --out-of-range is set to mod, clip, and percentile-pull-clip from left to right.

Code for making the animation:

for i in {0..99}; do
  c=$(bc -l <<<"scale=3; $i / 100")
  for oor in mod clip percentile-pull-clip; do
    python -m fimg --out-of-range="$oor" \
        ~/tmp/tower-400x300.jpg "out/$(printf %03d $i)_phase_rotate_angle_${oor}_${c}.png" \
        phase_rotate_angle --circle-fraction "$c"
  done
done

for i in {000..099}; do
  convert \
      "out/${i}"_phase_rotate_angle_mod_* \
      "out/${i}"_phase_rotate_angle_clip_* \
      "out/${i}"_phase_rotate_angle_percentile-pull-clip_* \
      +append "out/${i}_horiz.png"
done

ffmpeg -hide_banner -f image2 -pattern_type glob -i 'out/*_horiz.png' \
    -vcodec libx264 -pix_fmt yuv420p tower__phase_rotate_angle__oor-multi_v.f9210c51.mp4
A few hand-drawn lines showing the kind of mapping function I want
I want something like this

I'm still not satisfied with the options. I want a remapping function that will leave an image mostly alone if just a few pixels are way out of range, and that won't rescale an image to use the full intensity range if it didn't originally occupy it. (If an image is almost entirely in [50, 200] with just 1% of pixels in [200, 4000], I want the output to be in [50, 255]—not to have the low end stretched down to zero.) But if the whole thing is way out of range, like [-4000, -3000], I want it scaled and shifted to [0, 255], maximizing the contrast.

But most importantly, if two inputs only have slightly different intensity distributions, the outputs should only differ slightly! This seems obvious, but it's relevant when one of the images is entirely in-range and the other has some out-of-range pixels. If there's a dramatic jump between the two, my animations will look ugly. :-P

So I need something non-linear, and I need it to be tunable based on image statistics—mean, median, percentiles, etc. It's possible I need something that can handle both symmetric and skewed distributions. And it needs to be tunable in a way that allows smooth animations.

The best I've got so far is percentile-pull-clip. It usually meets these criteria, but theoretically if I had an image with more than 10% out-of-range pixels, and if they were orders of magnitude out of range, the rest of the image could be flattened during the rescale. (Setting --clip-percentile=0 so that a rescale always happens for out-of-range pixels quickly illustrates this.) So I'll keep looking.

Anyway, now I can revisit some of the images that were "corrupted" by out-of-range values. These use --clip-percentile 3:

Cameraman with rotate_phase_angle --circle-fraction 0.1
Cameraman photo with soft, rolling black and white diagonal bands
         overlaid on image, which is otherwise compressed towards middle gray.
Cameraman with roll_freq --x 2 --y 1, showing heavy intensity compression.

And I can redo the original animation to avoid the clipping. Much more pleasant, although the clipping did produce an interesting effect itself.

There's a slight "bounce" effect that might be related to how the rescaling behaves in an animation—it is continuous in the first derivative, but not the second.

Something is bothering me, though—why does rotating the phase angle cause leftward movement? I've double-checked my phase/amplitude math, but I don't see anything wrong with it. If I rotate the source image 90° first, the resulting animation is also rotated... but there's still a leftward drift, rather than switching to upwards. Might just be something I'm missing about the transform.

Additions 2022-05-02

Ah! I've misunderstood something about 2D FFTs. They are, in fact, run one axis at a time. This means that for certain kinds of messing around in the frequency domain, the effects will be particular to one of the axes. (The first one? The second one? Not sure!)

Really, I just need to go back and re-learn all this stuff. In college we only went over the general theory, looked at FFT of audio data, ignored all the phase information, and didn't cover the fine details of things like "why complex numbers".

I've also noticed that I'm using fft2 and irfft2, which may not be a matched pair! So all of the above could be "wrong" mathematically, which isn't a disaster since this is mostly about art, but it would be much more satisfying if the art could be based on more accurate math. :-)

Anyway, that phase rotation (now invoked via phase_shift instead of phase_rotate_angle) looks really cool on color images. Here it is on a headshot of myself, showing off the difference between clipping outof-range colors vs rescaling them to a percentile:

Animating phase_shift with clipping
Animating phase_shift with 5th percentile rescale

There are more vibrant colors with clipping, just before the fade to black, but you lose detail. By rescaling to keep at least 90% percent of the pixels in range, you lose contrast but gain detail. Notice how in the second one you can really see the color shift on the yellow flowers as they turn dark blue. And it's only in the second one that you can see the very cool effect of the colors bleeding leftwards even as the details stay in place.

Generation code Code version 74048755, ran:
for i in {0..74}; do c=$(echo "$i / 75" | bc -l); python -m fimg --out-of-range percentile-pull-clip --clip-percentile 5 headshot-512.jpg out/`printf %03d $i`.png phase_shift --turns "$c"; done
ffmpeg -hide_banner -f image2 -pattern_type glob -i 'out/*' -vcodec libx264 -pix_fmt yuv420p headshot__phase_shift_oor_ppc_5.mp4

No comments yet. Feed icon

Self-service commenting is not yet reimplemented after the Wordpress migration, sorry! For now, you can respond by email; please indicate whether you're OK with having your response posted publicly (and if so, under what name).