6.5 Making a difference


Now to see some examples of the psychoacoustical approach applied to questions specifically about musical instruments. These questions divide into two categories: the ones we would really like to ask, and the ones that are straightforward to answer. The first category includes all questions involving judgements of quality: “which violin do you prefer?”; “which sound is more rich?” (or “lively” or “nasal” or “shrill”); “is this an old Italian violin or a modern one?” We will put off all such questions until the next section.

For now, we will concentrate on the second category. This involves questions about the threshold of perception, or “just-noticeable difference”, when some parameter of the sound is varied. It is clearest to explain through an example. Back in section 5.4 we met the idea that a real musical string does not have perfectly harmonic overtones, because of the effect of bending stiffness (the details were given in section 5.4.3). Does this effect matter for the sound of real strings? A natural way to start exploring that question is to ask “what is the smallest bending stiffness that produces an effect people can hear?”

The good thing about this question is that there is a reasonably straightforward way to answer it. We generate a lot of sounds with varying inharmonicity corresponding to different levels of bending stiffness, then persuade a lot of volunteers to listen to pairs of these sounds, one with and one without the bending stiffness. They are asked to say which is which, then the level of bending stiffness is reduced, and they do it again. Eventually, the stiffness will get so small that their responses will be random: they can’t actually hear the difference, and they are simply guessing. This is the threshold of perception.

To give an idea of what kind of sounds might be used, Sound 1 gives a sequence of vaguely string-like synthesised sounds with different levels of inharmonicity due to bending stiffness. The first note, and every alternate one after that, has perfectly harmonic overtone frequencies. In between each of these pairs is a sound with some inharmonicity, and the amount increases each time. You probably can’t hear any difference with the first pair, but by the end of the sequence you probably hear a very clear difference. Somewhere in between would be your threshold of perception — for a note with this particular frequency, amplitude, decay time and relative energy level between the overtones. If any of those things was changed, the threshold might be different.

Sound 1. A sequence of string-like sounds with increasing inharmonicity from bending stiffness, alternating between perfectly harmonic sounds.

To do this experiment seriously requires much more care, of course. The threshold is not a sharply-defined thing. As it is approached, you get less and less good at recognising if there is a difference or not. In order to arrive at a particular value for the threshold, which can be compared with estimates made by other people, an agreed definition must adopted which is based on statistical distributions and the probability of making a correct judgement in a particular case. Usually, an ingenious procedure is followed in which the sounds are presented to the test subject in a carefully-constructed sequence, which automatically converges to an estimate of the threshold. Levitt [1] gives the statistical analysis to define precisely what that estimate means in terms of the probability of success.

In a typical test, at each step the subject hears three sounds, and has to pick the odd one out: a so-called “three-alternative, forced-choice” test. The order of the three sounds is randomised. If the subject gets it right three times in a row, the magnitude (of bending stiffness in our example) is reduced. If they get it wrong just once, it is increased. The factor by which the bending stiffness changes will also be varied, so that once the sounds are in the vicinity of the threshold, finer gradations are used. After a predetermined number of up-and-down reversals, the experiment is stopped and a suitable average taken of the values of bending stiffness in the final stages.

A careful study of this kind was carried out by Järveläinen et al [2], using string-like synthesised sounds somewhat similar to the ones in Sound 1 here. They covered the range of playable notes on a guitar, and compared the deduced thresholds with the measured inharmonicity of notes on a classical guitar. The result was that inharmonicity in the real guitar exceeded the threshold for perception on all notes and all strings, most strongly for notes on the 3rd and 6th strings of the guitar.

But by now you should not be surprised to learn that there are complications. The same team carried out another study, using sounds that were based much more accurately on the played guitar notes. This brought in factors like the nature of the starting transient, and the frequency dependence of the decay rates of the different overtones. Using these more complicated but more realistic sounds, they found that the thresholds of perception were higher [3]: sufficiently so that the inharmonicity of most played notes was below threshold.

These later results do not invalidate the earlier ones, but they show that considerable care is needed over the interpretation of such tests. We can give arguments to suggest that both results are of interest: it all depends on exactly what aspect of perception you are trying to probe. To argue for the importance of the original results, we first remind ourselves that a musician may develop very finely-tuned perception for sound details on their own instrument. In the context of the inharmonicity question, we must remember that the player can vary the way they play the notes on a particular string, and as they strive for the sound they “hear in their head” they will hear many variants of the sound, not just the particular sounds chosen for the psychoacoustical test.

It can thus be helpful to know the threshold of perception for a particular change, when we arrange the details of the test procedure to give the listener the best possible chance of hearing it. This should result in what we might think of as the ultimate threshold: beyond that point, it is not humanly possible to perceive this particular change, however subtle and finely-tuned the musician’s feature detectors may be.

On the other hand, the second inharmonicity experiment shows that under different circumstances the realistic threshold of perception may be significantly higher. The subject cannot achieve the ultimate threshold because of what is called informational masking: other aspects of the sound compete for the attention of your hearing system, and somehow confuse your ability to hear the particular thing under study.

What other questions could we apply this kind of approach to? Well, by recalling something from Chapter 2 we can explore an important issue for instrument makers, to do with the effect of changes they may make to the constructional details of a stringed instrument body. Provided the vibration amplitude is small enough, as it usually is, we can treat the body using linear theory. We then showed in section 2.2 that all we need to know about the body is the behaviour of the vibration modes: natural frequencies, mode shapes and damping factors. Constructional details influence the modes, and the modes determine the sound. So it makes sense to explore the perception threshold for changes in these modal parameters. In the light of the discussion in section 5.3, it would also be interesting to explore perception thresholds associated with any formant-like features.

Some initial studies on this question have been done, for the guitar and for the violin. These two instruments require rather different approaches, so we will discuss them separately. Back in section 5.4 we already met some of the computer-synthesised sounds used in the guitar investigation [4]: they are repeated below as Sounds 2, 3 and 4. Based on the measured response of a particular guitar, these sounds show the effect of raising or lowering all the body mode frequencies by one semitone (6%). This change is clearly audible, and it is not surprising to learn that the threshold for such a shift of all mode frequencies turned out to be significantly lower: about a 1% shift for the most acute listeners, and the most acute of them all could detect a change as small as 0.3% under the best conditions. Furthermore, these most acute listeners could detect a frequency shift around 1% for moving just a few of the modes: specific tests were carried out shifting some “signature modes” in the frequency range 150—250 Hz, and a cluster of modes lying in the band 500—1000 Hz.

Sound 2. All frequencies reduced by 6%
Sound 3. Reference case
Sound 4. All frequencies increased by 6%

When a similar threshold experiment was carried out in which the modal damping factors (or Q factors) rather than their frequencies were changed, listeners were much less sensitive. The threshold for hearing a change in modal damping turned out to be around 20%. To give an idea of the effect on sound of changing the damping of all body modes, Sounds 5–8 give examples. Sound 6 is the reference case. Sounds 5 and 7 have all the Q factors changed by a factor of 2, down and up respectively. Even with this large change, the sound is not very different. Finally, Sound 8 illustrates the effect of multiplying all modal Q factors by 4, and at last there is a clear change in sound, to something rather “boomy”. But we can conclude that modal damping factors are far less important to sound than modal frequencies.

Sound 5. All Q factors halved
Sound 6. Reference case
Sound 7. All Q factors doubled
Sound 8. All Q factors quadrupled

To explore the same question about changing the body modes of a violin requires a different method. The problem is that realistic synthesis of violin playing is much more challenging than was the case for guitar playing. The essential reason is that plucking a string can be treated quite well using linear theory, but bowing a string definitely can’t be. Violin playing is strongly nonlinear, introducing all manner of complications, as we will discuss in detail in Chapter ?. For the moment, we just need to note that attempts to do psychoacoustical testing using synthesised violin sounds results in a severe case of informational masking: listeners are so disturbed by the fact that it doesn’t really “sound like a violin” that they do not give their best judgements about the thing the experimenter is trying to test.

However, we can rescue our experiment by noting that while the motion of a bowed string is nonlinear, the resulting vibration and sound radiation by the violin body is probably not. So we can address questions about the sensitivity of sound to modifying the body modes by splitting the system into these two components. Instead of trying the synthesise the string motion, we can measure it directly using a laboratory version of an electric violin. Small force-measuring sensors can be embedded into the bridge of a violin, just underneath each string notch: Fig. 1 shows what it looks like.

Figure 1. A violin bridge equipped with a force-measuring sensor under each string, to use in the “virtual violiin” experiment.

A violinist can then play in the normal way, and the signal from the force sensors can be recorded. An example of what this sounds like is given in Sound 9. It is quite recognisable as violin playing, but the sound is rather muffled and characterless. We can then take the measured response of a violin body, or a simulated response after some desired modification, and combine it with the measured force to give a prediction of the “sound” of the violin. This combining process is called convolution: it was described, with an example, back in section 2.2.8. The virtue of this approach is that it allows many different “virtual violins” to be heard, while the input from the violinist remains exactly the same.

Sound 9. A snatch of violin playing as recorded using the bridge-force sensors shown in Fig. 1. This particular passage is played entirely on the G string.

I put “sound” in quotes in the previous paragraph, because what you get from the convolution calculation depends on what kind of body response has been measured. If it is the bridge admittance, like the examples discussed in Section 5.3, then what would be calculated is the waveform of structural vibration at the bridge. On the other hand, if the body response has been measured with a microphone at some particular position, then what will be computed will represent the sound that same microphone would pick up when the violin is played.

It might seem more natural to use responses measured by microphone, so that we really do model “sound” by the convolution process. However, as we have seen in earlier problems, there is a snag: if we want to do experiments in which virtual changes are made to the instrument response, we must use a type of response for which we have a good theoretical model. This is the case for an admittance, or other structural response, but it is not true for a microphone response. So for psychoacoustical experiments, admittance is the best response to use.

Examples of the output of convolution using measured bridge admittance of three different violins are given in Sounds 10, 11 and 12. They are all based on the recorded string signal from Sound 9. None of these violins was of particularly high quality, but the sounds are distinctly different in the three cases. All three sound significantly different from the original string sound.

Sound 10. The force signal from Sound 9, processed with the response of a violin.
Sound 11. The force signal from Sound 9, processed with the response of a second violin.
Sound 12. The force signal from Sound 9, processed with the response of a third violin.

This “virtual violin” approach has been used to perform a range of threshold experiments [5], somewhat similar to the guitar-based tests described earlier. Both the amplitude and frequency of body modes were changed: for individual modes at low frequency (the “signature modes” A0, B1- and B1+ shown in Section 5.3), and also for blocks of modes lying in different frequency bands. Some examples of the effect of shifting all mode frequencies can be heard in Sounds 13–16. The emphasis throughout was to fine-tune the details of the tests in order to obtain the best possible discrimination, with a view to estimating “ultimate thresholds”. It was found that lower thresholds were obtained from tests based on single notes, rather than more extended snatches of music. Less surprisingly, it was found that listeners with musical training did better than others.

Sound 13. A short extract from the “music” of Sound 9, convolved with the response of a violin used as the datum case for the threshold tests.
Sound 14. As Sound 13, but with all body frequencies raised by 1%.
Sound 15. As Sound 13, but with all body frequencies raised by 5%.
Sound 16. As Sound 13, but with all body frequencies raised by 10%.

Results for the best 5 listeners in each test were analysed. In broad summary, the thresholds were in the region of 2–4 dB in amplitude and 1–4% in frequency: towards the high end of those ranges for individual modes, towards the low end for bands. These results for frequencies are comparable with the values found in the guitar tests, although consistently a little higher. Perhaps the more complex sounds of violin playing, compared to synthesised guitar “playing”, led to some informational masking. Encouragingly, it was shown that good predictions for the various thresholds could be obtained based on differences between excitation patterns, as discussed in Section 6.4.

The examples described in this section have demonstrated that some interesting things can be learned by measuring thresholds of perception. The tests are somewhat laborious to carry out, but there is a well-established methodology: it is mostly a matter of taking endless care over details, and putting in the hours of work. Without a doubt, there is considerable scope for more studies of this kind: for example, the approach could be applied to the various parametric changes illustrated in Section 5.5 for a model of the banjo. But it is time to turn to the second category of questions we would like to attack by psychoacoustical tests.


[1] H. Levitt. “Transformed up-down methods in psychoacoustics”; Journal of the Acoustical Society of America 49, 467–477 (1971).

[2] H. Järveläinen, V. Välimäki and M. Karjalainen. “Audibility of the timbral effects of inharmonicity in stringed instrument tones”; Acoustics Research Letters Online 2, 79–84 (2001).

[3] H. Järveläinen and M. Karjalainen. “Perceptibility of inharmonicity in the acoustic guitar”; Acta Acustica united with Acustica 92, 842–847 (2006).

[4] J. Woodhouse, E. K. Y. Manuel, L. A. Smith and C. Fritz. “Perceptual thresholds for acoustical guitar models”. Acta Acustica united with Acustica 98, 475-486, (2012).  DOI 10.3813/AAA.918531

[5] C. Fritz, I. Cross, B. C. J. Moore and J. Woodhouse. “Perceptual thresholds for detecting modifications applied to the acoustical properties of a violin”; Journal of the Acoustical Society of America 122, 3640–3650 (2007).