6.6 Testing received wisdom

It is time to return to the difficult questions raised in Section 6.1. Ultimately, the quality of a musical instrument is not determined by acoustical measurements or perceptual thresholds: judgements by people are what matter. There are psychoacoustical techniques for putting such judgements on a quantitative and verifiable footing, but the experimenter is entering a minefield. The experiments are hard to design and carry out. The requirement for sufficient data to obtain statistically significant conclusions places one set of constraints, while the effort to identify possible sources of bias and then work around them gives another, even more challenging, set.

Meanwhile, the experimenters may have to brace themselves for a storm of protest at the results. There are pieces of received wisdom, “well-known facts”, surrounding many types of musical instrument. There is quite often a “myth of a golden age” of classic, never-equalled instruments, whether it concerns violins made by Antonio Stradivari or Gibson mandolins signed by Lloyd Loar. Of course, these are precisely the things we may be most interested to explore with scientific rigour. But there will be some who believe in this received wisdom with quasi-religious fervour, and they often react with vociferous outrage if the test results do not agree with their prior views.

The experimenters then need to tread a very careful line. Musical judgements are subtle, and it is always likely that there is at least a grain of truth in the received wisdom. If a first attempt to test one of these ideas finds no evidence, then good experimenters will try to set aside the aggressive tone of the criticism that comes their way but listen very carefully to the ideas that are suggested about what was wrong with their experiment. Scientific progress often requires patience and persistence, and they may design a second version of the experiment in the light of comments, and see if that gives a different result. And so on… But if a sequence of such efforts still fails to give clear evidence in favour of the received wisdom, things begin to look black for the traditionalists.

We will look at an example where a sequence of studies illustrates this process, but before that we look at a simpler example [1] — although not without some controversy. Traditional acoustical guitars, whether classical or steel-strung folk-style, have favoured the use of tropical hardwoods to make the back and sides of the soundbox. A particular favourite is Brazilian rosewood, Dalbergia nigra. Nowadays any use of tropical hardwoods is viewed with suspicion, and in any case many species, including Brazilian rosewood, are now on the CITES list which places very severe limits on any international trade. Guitar makers are exploring many alternative materials, and it is obviously of interest to them to know whether they will be losing out in terms of sound if they stop using the traditional timber species.

To do a systematic experiment, you first need to involve a friendly guitar maker with sufficient experience to make a set of instruments like the ones illustrated in Figs. 1 and 2. The six guitars seen here each use a different timber for the back and sides, while everything else was matched as closely as possible across the whole set. The selected timbers have a variety of appearances, as can be seen clearly in Fig. 2. They also cover a wide range of price and sustainability credentials.

Figure 1. The six guitars for the back-wood experiment. Image copyright Michael English, reproduced by permission.
Figure 2. Back view of the six guitars, showing the different timbers. Image copyright Michael English, reproduced by permission.

A large number of experienced guitarists were recruited to perform playing tests of two different types. The tests were “blinded”: a lot of care was taken to make it hard for the player to identify the guitars other than through playing and listening. Following a procedure pioneered in the violin studies to be described a little later, the players wore welder’s goggles and performed in a dimly-lit space. The idea was to allow them to see just enough to be able to handle the instruments safely, but not to be able to recognise the back wood despite the strong differences of colour and pattern revealed in Fig. 2.

In the first test, the players were given each guitar in turn, given time to explore and get used to it, then asked to give numerical ratings for various qualities: “overall sound”, playability, and then a list of specific qualities like brightness, warmth and richness. Some of the guitarists repeated the whole test after having a rest, to allow the consistency of their judgements to be checked. The second experiment was an “ABX” test. The player would be handed guitar “A”, given a bit of time with it, then it would be swapped for guitar “B”. Finally, they would be handed one or other of these guitars, and asked to decide whether it was A or B. All six guitars were used for the rating test, but because of constraints on time only three were used for the ABX test (selected to represent the range of price and sustainability).

The results of both tests were subjected to a battery of statistical analysis: the details are given in [1]. The result? The players were not convincingly able to distinguish the guitars at better than chance levels, in either test. Now, we need to be careful in interpreting a statement like this. The experiment does not prove that no player under any circumstances would be able to discriminate between any of these instruments. But it does say that any such discriminatory ability is subtle or rare, to the extent that it was not revealed convincingly by this fairly careful test involving 53 different guitarists.

However, this is not quite the end of the story. We already saw, in section 6.5, that estimates have been made of the “ultimate” threshold for discriminating guitars on the basis of changes to the body modes. These estimates were made using synthesised sounds based on measured body behaviour. The same approach was applied to the six guitars: acoustical measurements were made of them all, in the same way described earlier (see section 5.1), and then snatches of music were synthesised on them all. Those synthesised sounds could be used for another version of the ABX test, but this time we have a clear expectation about whether differences should be perceptible or not.

The set of measured bridge admittances is shown in Fig. 3. Also included is a guitar of generically similar kind, but not part of the set of six. The plot shows that the set of six were all clearly different from the extra instrument shown in the blue curve, but rather similar to each other: a tribute to the skills of the guitar maker. For the purposes of the ABX listening test, the measured admittances were not used directly. Instead, attention was focussed purely on the three “signature modes” giving strong peaks in the frequency range below 400 Hz. We have already seen in section 5.3 what the mode shapes look like corresponding to these peaks. The frequency, amplitude and Q factor of these three modes was extracted from the measurements. A reference guitar response was chosen, and then modified to produce six versions matching the signature modes of the six guitars, while all other details were the same. These were used to generate synthesised sound files.

Figure 3. Bridge admittances of the six guitars (red). For comparison, the admittance of an unrelated guitar of the same general type is shown in blue.

The thresholds found in the earlier study [2] give clear predictions of whether the differences in signature modes are sufficient to be audible. The result is that the extra guitar should be clearly different from any of the set of six, but that differences among the set were quite small and barely above the threshold of perception. Among the six, there were two that stood out with signature mode frequencies that were systematically about half a semitone lower and higher than the average for the group. These represent the extremes, and the difference between them should be big enough to be perceptible by a skilled listener. These happened to be the guitars built using Indian rosewood and Sapele for the backs and sides.

The formal listening test confirmed these predictions. The extra guitar could be distinguished from the others reliably. The Indian rosewood and Sapele guitars could be distinguished from each other, but not as reliably. The other members of the set of six could not be distinguished above chance levels. You can hear some of the synthesised sounds in Sounds 1, 2 and 3. The music is a short extract from the tune “Tears in heaven” (a favourite of one of the students who did the work for the threshold project [2]).

Sound 1. Synthesised music based on the guitar with Indian rosewood back and sides.
Sound 2. Synthesised music based on the guitar with Sapele back and sides.
Sound 3. Synthesised music based on an entirely different guitar, not part of the set of six. Its bridge admittance was shown in the blue curve of Fig. 3.

There is a final twist to the story. Do we conclude that Indian rosewood and Sapele backs and sides will produce different-sounding guitars? No, that would probably be misleading. Looking back at the discussion of signature modes of guitars in section 5.3, we can see that while the first two of these modes might plausibly be affected by the behaviour of the back, this is not the case for the highest of the three modes. That mode has motion largely confined to the top plate, and it produces virtually no net volume change so that it should not couple very well to back plate motion. But for the two guitars in question, all three of these modes were a little higher or a little lower than the average, by a similar factor.

That suggests that the origin of the difference on which the perceptual result was based does not lie in the backs at all. More likely, it is due to small differences in the top plates (including the bridge, and the braces glued to its underside). The guitar maker has done a masterly job in using the six different back woods to make very similar guitars, despite significant differences in density and stiffness properties of the woods. We should not be too surprised: instrument makers are used to the variability of wood, and part of their skill set is to know how to compensate for those variations to produce a consistent end product. We should also note that this skilled guitar maker is still convinced that back wood does make a difference to the sound. The results show that any such influence is subtle, and most players are incapable of hearing it during normal playing, but we should certainly not dismiss his opinion hastily. But nevertheless it is clear that guitar makers can safely experiment with more sustainable materials, without fear of acoustical disaster.

We now turn to a more controversial subject. The popular perception of a “secret of Stradivari” is very widespread, and the pattern of market values supports the idea that there is something special about certain old Italian violins. There is a long history of public “tests” in which a Stradivari and something else was played behind a curtain, and the audience asked to vote on which is better. None of this would qualify as serious science, which requires double-blind testing, sufficient test subjects and repeat tests, and results verifiable by independent researchers.

Only in recent years have convincing experiments begun to be performed. The first experiment took advantage of the availability of high-class instruments and violinists at a violin competition, in Indianapolis in 2010 [3]. It involved six violins: two by Stradivari, one by Guarneri “del Gesu”, and three by contemporary makers. These were used for playing tests that were somewhat similar to the guitar tests just described. The experiment needed to allow players to handle the different instruments under test in a safe and natural way, but without being able to see which was which and thus bring in additional information and bias. This is where the dim lighting and welder’s goggles come in: that procedure was first used in this experiment. The goggles can be seen in action in Fig. 4, showing one of the participants in the second experiment (to be described shortly).

Figure 4. Violinist Ilya Kaler taking part in the second experiment. Image copyright Stefan Avalos, reproduced by permission.

The experimenters took the very reasonable view that the most acute discrimination between instruments is likely to come from players, rather than from external listeners, however expert. There is a simple reason for this, which we will explore in some detail in Chapter ?: the player is inside a feedback loop, able to adjust details of bowing to try to create the sound they want, but the listener only hears the end result. A skilled player can coax a good sound, at least on certain notes, from more or less any violin, but the player will still be well aware that they have to try a lot harder on some violins than others to get this effect.

The players were given several different tasks: to choose their favourite and to rate the instruments against one another based on various different criteria. The results were a surprise, not least to the experimenters themselves. At that time most people expected to find some degree of preference for the famous old instruments: the debate was about how big the difference would prove to be. But the cautious conclusion of the authors after this first experiment was that no statistically significant difference between the two groups, old and modern, was found in terms of preference. In fact the raw data suggested that certain of the modern instruments were slightly preferred to any of the old ones.

These results caused a media storm. Some famous names weighed in with negative comments. Among the invective, they raised a number of perfectly reasonable objections to the details of the experiment. The tests were carried out in a hotel room, whereas the natural home of the great old instruments is the concert hall. There was no possibility to try the instruments in an ensemble or with an accompaniment. There weren’t very many instruments, and the set of participating violinists were rather a mixed bag in terms of standard and experience.

To address some of these objections, the team organised a second experiment involving 10 first-rate soloists in a rehearsal room and then in a concert hall with an option on piano accompaniment [4]. This time, six old Italian violins (including five by Stradivari) and six modern instruments were pitted against each other. A careful process of selection was used for both groups of instruments, choosing the preferred ones from large initial pools (the authors have kept the exact identities of all instruments secret).

This second experiment was more carefully conducted than the first, and the results were in complete agreement with the previous findings. They also gave some additional information. When asked to choose a violin that might plausibly replace their own for an upcoming tour, six of the soloists chose new violins and four chose Stradivaris. A single new violin was easily the most-preferred of the 12. On average, soloists rated their favourite new violin more highly than their favourite old one for playability, articulation, and projection, and at least equal in terms of timbre. Finally, the 10 soloists failed to distinguish new from old at better than chance levels.

The doubters were still not convinced. This time, they raised an objection based on a long-held belief about “projection”. It is often claimed that the classic Italian instruments can be quiet under the ear, but that their audibility improves for a distant listener, out in the concert hall. To explore this idea, a third experiment was organised: in fact, two versions of the experiment were run, in concert halls in Paris and New York [5]. In each case, three contemporary violins were pitted against three by Stradivari. In the Paris experiment, the instruments were tested both with and without orchestral accompaniment.

The key difference in these experiments was that the audience out in the concert hall rated the various instruments. There were 55 and 82 participating listeners in the Paris and New York experiments, respectively. Figure 5 shows a view of the scene during the New York experiment. Pairs of instruments were played, and the listeners had to vote which was heard better, and which was preferred. Another stage of the experiment involved presenting new/old pairs of instruments, and asking the listeners to decide which was which.

Figure 5. A scene in the New York phase of the third experiment. Image copyright Hubert Raguet, reproduced by permission.

The results were very clear. The listeners preferred the new violins over the old, and also found that the new violins projected better. Results for projection with and without the orchestral accompaniment were strongly correlated. Furthermore, preferences expressed by the audience were in good agreement with those from the players. Finally, the audience members could not distinguish new from old instruments at better than chance levels.

So there seems to be no secret of Stradivari. The best of the classic Italian instruments are still very good, of course, but the best of contemporary instruments can hold their own when given a level playing field on which to compete, and even sometimes out-perform the old instruments. Here is what the authors wrote after the third experiment: “A belief in the near-miraculous qualities of Old Italian violins has preoccupied the violin world for centuries. It may be that recent generations of violin-makers have closed the gap between old and new, or it may be that the gap was never so wide as commonly believed. Either way, the debate about old versus new can perhaps be laid aside now in favor of potentially more fruitful questions. What, for example, are the physical parameters determining the playing qualities of any violin, regardless of its age or country of origin?” [5].


[1] S. Carcagno, R. Bucknall, J. Woodhouse, C. Fritz, and C. J. Plack . “Effect of back wood choice on the perceived quality of steel-string acoustic guitars.” Journal of the Acoustical Society of America 144, 3533-3547, (2018).  DOI 10.1121/1.5084735. The article may be found here: https://doi.org/10.1121/1.5084735

[2] J. Woodhouse, E. K. Y. Manuel, L. A. Smith, A. J. C. Wheble and C. Fritz. “Perceptual thresholds for acoustical guitar models”. Acta Acustica united with Acustica 98, 475-486, (2012).  DOI 10.3813/AAA.918531.

[3] C. Fritz, J. Curtin, J. Poitevineau, P. Morrel-Samuels and F.-C. Tao. “Players preferences among new and old violins”; Proceedings of the National Academy of Sciences of the USA 109, 760-763 (2012).

[4] C. Fritz, J. Curtin, J. Poitevineau, H. Borsarello, I. Wollman, F.-C. Tao and T. Ghasarossian. “Soloist evaluations of six Old Italian and six new violins”; Proceedings of the National Academy of Sciences of the USA 111, 7224-7229 (2014).

[5] C. Fritz, J. Curtin, J. Poitevineau and F.-C. Tao. “Listener evaluations of new and Old Italian violins”; Proceedings of the National Academy of Sciences of the USA 114(21):2016194439 (2017). DOI 10.1073/pnas.1619443114.