Measuring Court Performance
MEASURING COURT PERFORMANCE
ADDRESS BY THE HONOURABLE J J SPIGELMAN AC
CHIEF JUSTICE OF NEW SOUTH WALES
TO THE ANNUAL CONFERENCE OF THE
AUSTRALIAN INSTITUTE OF JUDICIAL ADMINISTRATION
ADELAIDE, 16 SEPTEMBER 2006
In mid 18th century London a mathematical prodigy called Jedediah Buxton was taken to see David Garrick perform in Shakespeare’s Richard III at the Drury Lane theatre. When asked whether he had enjoyed the play, his reply was that it contained 12,445 words. His analysis did seem to miss some significant things: the sarcasm of “Now is the winter of our discontent, made glorious summer by this sun of York” and the desperation of “A horse, a horse, my kingdom for a horse”.
The purpose of my address is not to deny the beauty of numbers. Nor their utility. My purpose is to emphasise, as Jedediah Buxton’s reaction manifested, the inability of numbers to always identify what matters. Today Jedediah would be diagnosed as autistic. What I will be discussing could be called the autistic school of management.
However, as that will, no doubt, offend somebody, I will revive a word that has fallen into disuse: pantometry, which means universal measurement – the belief that everything can be counted.
At the time of my appointment as Chief Justice over eight years ago I became aware of a range of proposals for performance measurement of courts, in accordance with the managerialist ideology that has come to dominate so many aspects of the public sector. Furthermore, statutory tribunals responsible for judicial salary determination were indicating an interest in linking salaries to performance. I regarded these developments as a challenge to judicial independence and potentially corrosive of the rule of law.
I was concerned about proposals to apply to the legal system the kinds of performance indicators that had, in my opinion, significantly distorted policy and administration in areas such as education and health, where the indicators had been entrenched in funding formulas, budget processes and remuneration decisions. Over a period of two or three years, I delivered a number of published speeches on the subject .
As it turned out my fears were not realised. The pantometry school did die down. The statutory tribunals responsible for judicial salaries, which had proposed to develop some kind of measure of performance in the course of deciding changes in judicial salaries, realised that that was nonsense and, perhaps, realised that it was pernicious. In any event they stopped talking about it. Other public sector pantometrists, who appeared to entertain expansive notions of the capacity of performance indicators and national benchmarking to drive changes in court practices, also seemed to have their ambitions tempered. Perhaps they realised that the principles underlying the institutional heritage of the judiciary and the operation of the rule of law were in conflict with their proposals.
The focus turned to the compilation and publication of statistics – key performance indicators in the argot of the trade – relating to matters such as delay and costs. These are matters which are both capable of assessment in quantitative terms and which provide information that is useful to the courts and the publication of which serves to enhance the accountability of the courts.
I have left these issues alone for some years now. However, I have become aware that the idea of developing a so called “quality indicator” has re-emerged both in the AIJA and in the Court Administration Working Group, which oversees the preparation of material relating to courts that is published by the Productivity Commission in the annual Report on Government Services.
One of the themes of the speeches I used to give on this subject was the limited significance that could be attributed to the matters that were capable of quantitative measurement. I did not doubt the importance of measurement of such matters as delay or cost. However, I emphasised that the most important aspects of the work of the courts are qualitative and cannot be measured.
My central proposition was really quite a simple one, not everything that counts can be counted. Some matters can only be judged – that is to say they can only be assessed in a qualitative way. Most significantly there are major differences between one area of government activity and another in the importance of those matters that are capable of being measured. In some spheres of governmental decision making the things that can be measured are the important things. In other spheres the things that are important are simply not measurable. The law is at the latter end of the spectrum.
I was aware from the managerial literature on performance measurement that many things, particularly with respect to quality of a product or service, were difficult to measure. I was also aware that in the Productivity Commission reports, there was a standard form template which required the development of quality indicators for all sectors of government. However, that remained an empty box in the template with respect to courts. Indeed, in the annual Report on Government Services, on one count, it remains an empty box for seven of the thirteen different sectors surveyed. In those where this box was not empty, the indicators appear to me to be of tangential significance, bordering on the trivial. They are, at best, proxy indicators which bear no necessary or even direct relationship to what matters.
Seven or eight years ago there was a proposal for what was called a national Client Satisfaction Survey of the Australian court system with a view to what was then described as “measuring the quality of court services”. The proposal was put before the Council of Chief Justices where, to some degree it received a bemused response. Questions such as “Precisely who is a client of the criminal justice system?” were asked. An assurance was given that there was no intention to survey satisfaction with judicial decisions.
The Chief Justices Council nominated Chief Justice Nicholson of the Family Court, Chief Justice Miles of the ACT Supreme Court and myself to pursue discussions on this proposed survey. In the event, the proponents did not speak to anyone other than Chief Justice Nicholson. The Family Court is, so far as I am aware, the only Court that has ever shown much interest in performance indicators.
The detail of the survey then proposed was in large measure unexceptionable. The matters were of a character which have been the subject of surveys undertaken by a number of courts, including my own. I refer to matters such as the performance of counter staff, the utility of signage, the adequacy of facilities such as toilets and telephones, discussion and preparation rooms, and the standard of access for people with physical disability. A proposal that the survey extend to satisfaction with court processes, beyond matters such as availability of interpreters, was rejected by the Council of Chief Justices.
No Chief Justice thought the terminology of “client” satisfaction was in any way appropriate. However, the Council of Chief Justices approved the general structure of the survey directed as it was to matters that were administrative, without trespassing on the judicial function. Nothing was proposed which involved “satisfaction” with what could be called judicial administration as distinct from court administration. I include case management in judicial administration.
One of the documents provided to the Council of Chief Justices said that data about “client satisfaction”, to be derived from the survey, was not intended for purposes of comparisons between jurisdictions but to track changes in a particular jurisdiction. It is not possible, however, to prevent inappropriate comparisons, whatever the intention.
There the matter has rested for some five or six years until this year. The development of a quality indicator for what is called “court administration” is back on the agenda. Past expressions of concern, which may well have included my own, are said to have been based on an incorrect assumption that measurement related to the quality of judicial decisions. However, previous suggestions for possible “quality” indicators have included reference to appeal rates, which obviously involves judicial decision-making.
It is by no means clear to me that the pantometric ideology that is dominant in the contemporary public sector will draw a boundary between judicial administration and court administration at the same place that judges would determine that boundary. Accordingly, it is appropriate to return to this subject.
Some of you will have either heard or read me on these matters before. I apologise for the repetition. That’s the problem with fundamental principle, it doesn’t change over time.
Persons responsible for organisation in the public sector used to refer to their vocation as “public administration”. Most of the university courses and academic journals were so entitled. For some decades, however, this sphere of discourse has referred to itself as “public management”. The change in terminology is significant. It represents a development in which “managers” have made a claim, of a professional character, to the universal applicability of their vocation to all spheres of organised activity, whether in private corporations or in the not for profit sector or in the public sector.
At the time when the focus was on “administration” there was validity in the proposition that the skill set required for administration was similar from one sphere to another. The same claim of universality has however been carried over when “administration” became “management”. The “managers” purport to be able to determine a much wider range of organisational conduct than they did when they were “administrators”. This claim to institutional territory extends to the way in which the objectives of an organisation are determined and how they are to be achieved. With respect to such matters, however, the “one size fits all” approach is simply a conceit. More significantly it is a conceit which, in my opinion, can undermine fundamental values in many spheres of discourse, including the administration of justice.
As it emerged over about the last two decades, public management turned on a requirement for a hierarchy of documentation. At the highest level of generality or abstraction is a document described variously as a strategic plan, a corporate plan, a charter or mission statement. I have even seen references to a “vision statement”. Below that level is a document focused on the process of annual implementation, variously called a business plan or a performance plan or the like. These plans are required to contain goals, objectives, targets or standards at a level of generality that is implementable and, preferably, capable of measurement and stated in quantitative form. The next level down in the hierarchy is what is frequently called performance indicators, which are required to be measurable, concrete, collectable at reasonable cost and comparable, either between institutions or over time for the one institution. Finally, the process must be capable of independent evaluation, both within the unit of public administration and by investigatory and regulatory bodies, such as finance departments, auditors general and parliamentary committees.
At the level of strategic plans, mission statements, charters and the like, one generally finds the broadest platitudes, unlikely by their nature, to have any effect on actual behaviour. No doubt there are some areas of public administration for which clarification of objectives at this level of generality performs some useful function. I do not myself accept the proposition that one cannot plan for the future or know what one is doing unless one writes it down. However, in the immortal words of an English footballer: “If you have the courage to look far enough ahead, you too can see the carrot at the end of the tunnel”.
I detect a significant decline in the enthusiasm for strategic plans, charters and mission statements. I have never seen one that wasn’t a waste of time and paper. That seems to be quite widely accepted today. However, in the not too distant past, the desirability of such strategic plans was advanced as if it constituted the only rational way of approaching organisational activity.
Management is, and has manifestly been for some considerable time, a fashion industry. For example, in the United States, with respect to budgeting processes, there has been a succession of passions each stated with complete certitude at the relevant time: in the 1950s there was “performance budgeting”; in the 1960s it became “programme budgeting”; in the 1970s there was “management by objectives”; and in the 1980s, the current approach for the hierarchy of documentation to which I have referred, came to be required by statute. Each of the previous approaches was accepted, on each such succession, to have been a failure.
To similar effect, one American author identified a range of consecutive management fads which were applied in higher education in the United States over the course of the last two or three decades: the Planning Programming Budgeting System was replaced by Zero-Based Budgeting, which was replaced by Management by Objectives which was replaced by the emphasis on strategic planning and benchmarking and which has since had competition from Total Quality Management and then Business Process Re-engineering . Since then we have moved to the Triple Bottom Line and, no doubt, other transient enthusiasms of which I am not aware.
These are simply the more abiding of the managerial fads of recent decades. As any visit to the burgeoning management sections of bookshops will show, these fads come and go as rapidly as the equally large and burgeoning sections on personal diets. Scarcely a week goes by without some new volume proclaiming the abiding utility for managers of the insights to be found in an obscure author whose work is available, if at all, only in the Penguin Classic series, or from some other set of insights which are to be deduced from a catchy aphorism.
The one thing that appears to be stable is the emphasis on measurement and on its universal applicability – pantometry. There is little recognition that what is capable of measurement is not necessarily what matters. Nor that this capacity varies considerably from one sphere to another. Furthermore, experience in many spheres of discourse now establishes that the processes of measurement often has significant dysfunctional, indeed perverse, effects. An important reason for this is that it is very difficult to measure the quality of governmental activity.
Measures of quality are available in many spheres of conduct. For example, quality can be calculated in terms of defect rates in manufactured goods. Customer complaint statistics may also prove indicative. However, there are significant areas of public decision-making, and the law is one of them, in which there is no measurable indicator of quality, even at the level of defect rates or numbers of complaints. There is simply no escaping qualitative assessment for purposes of evaluation. What this means is that decision-making processes which are based only on quantitative measurement are so defective as to be irrational.
At the heart of managerialism is the assumption that something called “management” is universally applicable to all areas of organised life. This is not a neutral assumption. Nor is the belief in pantometry. The managerialist focus is on matters capable of measurement, like efficiency and effectiveness. This does not, however, represent the full range of values which are of significance for public decision-making. Other values such as accessibility, openness, fairness, impartiality, legitimacy, participation, honesty and rationality are also of significance. They are not capable of measurement, not even by proxy indicators.
Our system for the administration of justice is not the most efficient mode of dispute resolution. Nor is democracy the most efficient mode of government. We have deliberately chosen inefficient ways of decision-making in the law in order to protect rights and freedoms. We have deliberately chosen inefficient ways of government decision-making in order to ensure that the government operates with the consent of the governed.
Managers, especially those who believe in pantometry, of course accept that considerations of “quality” matter as well as quantity. Indeed it is obvious that these two dimensions are often inversely related to each other.
Experience suggests that these incantations about the importance of quality often do not rise about the ritualistic. Quality considerations receive lip service and the matters capable of quantification more often than not determine the actual outcome. The search for a measurable indicator of “quality” is, to a significant degree, a recognition that the claim to universal applicability of managerialism is contestable.
Quantitative measurement, by reason of its very concreteness acquires a disproportionate and inappropriate influence over considerations of quality, which appear to be amorphous.
Decisions that plainly call for judgment are now often made in various areas of the public sector on the basis of partial, purportedly objective considerations, with dramatic consequences which, probably, no-one would have chosen in a more comprehensive decision-making process. Measurement is implemented in the name of rationality. However, it is often, by its very nature, partial and incomplete. It confers no more than a pseudo scientific precision. Such partial rationality, by reason of the incompleteness, often proves to be fundamentally irrational.
At the heart of these issues is a power struggle between the proponents of the “new public management” like Treasury officials, departmental finance officers and auditors (to whom I find it convenient to refer as “the managers”) and persons like teachers, doctors or lawyers involved in public decision-making processes (to whom I will refer as the “professionals”). Professionals involved in public sector decision-making tend to emphasise the significance of qualitative considerations. Managers tend to emphasise measurable indicators and objective formulae.
It is perfectly understandable why this should be so. To the extent to which qualitative considerations, that cannot be reduced to numbers, are given weight, the professionals will have the greater say. Unless matters can be reduced to measurable standards and indicators, the managers will not be able to exert significant influence. Managers do not have the capacity to make qualitative judgments. Accordingly, they have an inbuilt institutional bias to downgrade the significance of quality or to attempt to measure it by some kind of proxy indicator. As a regrettably anonymous pundit once put it: “Where you stand depends on where you sit”.
Public managers have an image of themselves as the custodians of the objectives of an organisation and, often, as the representatives of the taxpayer in the interests of ensuring accountability minimising expenditure and maximising efficiency. They sometimes resent the high degree of autonomy of professionals – like teachers, doctors and lawyers – and categorise their pre-occupation with matters of quality as rent seeking activity. They tend – sometimes with reason – to regard professionals as particularly liable to engage in self-serving conduct and to manifest no capacity to prioritise or to regard professional standards as anything but absolute. In a world where choices have to be made about the allocation of resources, there are no such absolutes.
This power struggle can be a creative tension with positive effects. There is however, a very real possibility, based on experience in areas such as education and health, that the managerialist approach will force its one size fits all template on the administration of justice. I think that would be disastrous for the quality of our legal system.
The experience of the collapse of Communism should have taught us, if we did not understand it before, that a society which is organised on a single institutional principle is fundamentally unstable. A diversity of organising principles for social institutions is as significant for the health of our society as biodiversity is for our ecology. A monoculture is inherently unstable.
A major defect of managerialists who believe in pantometry is that their approach tends to reduce citizens to consumers.
A person’s interest as a consumer is only one part of the person’s status as a citizen. The consumer analogy has become, in many respects, a feral metaphor that has acquired a disproportionate degree of prominence.
Consumers have desires or needs. Citizens have rights and duties. The perspective of citizenship is of greater significance for many areas of public activity than the perspective of consumerism. This is the case with the administration of justice.
The proposal for a “quality indicator” designed to elicit “client satisfaction”, to be applicable to what is called “court administration”, manifests this problem in the very choice of terminology. Courts do not have “clients”. Litigants are not consumers. Litigants have rights. They come to court to assert their rights, not to exercise some form of consumer choice.
There is nothing new about a focus on citizens as consumers. Such a focus is inherent in utilitarian philosophy. This philosophy requires, when one assesses the value of institutions in any sphere of conduct, that nothing matters but the state of mind of the persons who will be affected by that conduct. Utilitarianism focuses only on the calculation of pleasure and pain. This is an exceptionally impoverished view of human nature.
Utilitarianism is a moral philosophy based upon the calculation of consequences. This approach has always involved mechanical and quantitative thinking. It has no place, although some have tried to argue the contrary, for any other moral rule or for the idea that some conduct by its very nature is immoral. Nor does it have any place for the idea that justice must be administered in accordance with law, irrespective of the consequences in the state of happiness or otherwise of the persons who are involved in the administration of justice. Jeremy Bentham famously described inalienable human rights as “nonsense on stilts”. Rights, he correctly understood, were inconsistent with pantometry. Bentham was the world’s first pantometrist. Indeed, pantrometry acquired a religious quality for utilitarians. It still does.
Two centuries ago Bentham and his acolytes went around England measuring everything they thought could manifest the state of happiness of the population: they counted the number of cesspits (which was an indicator of ill health); they counted the number of pubs (an indicator of immorality); and they counted the number of hymns that children could recite from memory (then regarded as a form of educational attainment). The Benthamites spent an enormous amount of completely unproductive time trying to identify the precise way in which pleasure and pain could be measured.
Their direct successors are still with us, still searching for proxy indicators of quality.
A critical reason why a consumer focus is inadequate, indeed borders on the irrelevant, for the administration of justice is because courts are not merely a publicly funded dispute resolution service. To treat them as if that is all they are is far too narrow. Indeed, in my opinion, it is potentially subversive of the rule of law. It sets at nought the constitutional function of the courts to preserve the integrity of institutions, especially the mechanisms of governance. It sets at nought the role of the courts to protect society. It sets at nought the role of the courts to prevent abuse of power.
Courts do resolve disputes. However, they do so as an arm of government which manifests the public interest in the peaceful and fair resolution of private disputes. Court processes are not, and have never been, a facility that the government makes available to serve a private purpose.
I have no doubt that the courts serve the people. However, they do not provide services to the people. This distinction is not merely semantic; it is fundamental. The courts do not deliver a “service”. Courts administer justice in accordance with law. They no more deliver a “service” in the form of judgments and decisions, than a parliament delivers a “service” in the form of debates and statutes.
This is perhaps clearest in the context of the criminal justice system or the civil enforcement, whether by a public authority or by a private litigant, of publicly proclaimed standards. Such standards have been developed by the common law, although increasingly they are expressed in statutory form. These standards manifest a public statement of proper behaviour. Individuals employ such standards to resolve their private disputes, but they remain publicly proclaimed standards designed to serve public purposes.
There are some judicial contexts in which the primary objective is the actual resolution of a dispute, e.g. litigation involving families. Generally, however, the objective is not merely to resolve the dispute as such, but to serve public purposes by the process of resolving disputes. The enforcement of legal rights and obligations, the articulation and development of the law, the resolution of private disputes by a public affirmation of who is right and who is wrong, the denunciation of conduct in both criminal and civil trials, the deterrence of conduct by a public process with public outcomes – these are all public purposes served by the courts in the resolution of private disputes.
The judgments of courts are part of a broader public discourse by which a society and polity affirms its core values, applies them and adapts them to changing circumstances. This is a governmental function of a broadly similar character to one of the functions performed by a parliament. This has no relevant parallel in most other spheres of public activity, let alone in private activity. That is why, whatever its relevance to other sectors, a consumer perspective is inapplicable to the administration of justice.
It is important to recognise that measurement has consequences. It is not neutral in its effects. As some have put it: “What gets measured gets managed”. Where what is capable of measurement is not the only thing that matters, the results are often malign. I am concerned that if courts come to be judged by something called “client satisfaction”, then the administration of justice could be perverted in a search for popularity.
The pathology of measurement arises when the indicators are targeted. Because performance measurements, particularly of a qualitative nature, are necessarily partial, targeting the indicator can have disastrous consequences for the true performance of an organisation.
In the former Soviet Union the only thing that they did not have a shortage of was performance indicators. They called it a five year plan. Every area of activity had a formal measurable target, the performance of which had significant implications for both the organisations and the persons running them. In one period, the five year plan for nail manufacturers identified output in terms of tonnes. Every manufacturing plant in the country made large nails and there was a shortage of small nails. Accordingly, in the next five year plan the target was stated in terms of numbers of nails, the inevitable happened, everyone made small nails and there was a shortage of big nails.
There is a direct line from Jeremy Bentham to Frederick Winslow Taylor, who invented “scientific management” with its pantometric preoccupation with measurement, to Lenin, who much admired Taylor.
Examples of the pathology of measurement are not limited to Soviet planning systems. Such examples can be identified in many different areas, including in the private sector. In the disaster that was Enron it was the focus on the share price as a measure of success and, in the form of stock options, as a determinant of the remuneration of the executives, that led to a process of systematic distortion. The Enron house of cards was built on the pathology of performance measurement.
Over the years, I have collected numerous examples of the perverse effects of targeting the indicator – variously described as “gaming the system”, a “moral hazard” and the like. A few examples will suffice:
- A United States job training scheme allocated funds on the basis of results in finding jobs. Agencies maximised their funding by refusing to accept for training people who were unlikely to get jobs, i.e. the very people who needed help most .
- When comparative success rates for cardiac surgeons began to be published in New York and Pennsylvania, mortality rates in both states declined significantly because heart surgeons refused to operate on the risky cases, who were referred to adjoining States .
- Police stations in Paris who were assessed on crime levels in their districts, refused to make a formal record of crime reported to them .
- Publication of performance data and league tables of English and Scottish schools had dysfunctional effects when schools concentrated on achieving the indicators and reduced emphasis on other school objectives such as the development of personal and social skills or the allocation of time for subjects such as physical education and art, which were not measured .
- UK prisons which are assessed, as one of a bewildering range of indicators, on the proportion of drug free prisoners, have no difficulty in ensuring a good result from the supposedly random process of testing, by selecting the sample of inmates who are tested .
- English hospitals are judged on whether they admitted 90 percent of emergency patients within four hours. Whenever the annual measurement was due, hospitals cancelled operations and flooded their emergency departments with doctors and nurses .
Distortions arise because the things that can be measured are not the only things that matter. Insofar as external judgments are made on an information base which is too narrow, then the incentives created by performance indicators will operate perversely. The more significant the consequences of the measured results, the greater the perversity.
- Telephone services for ambulances in Victoria were outsourced to a private contractor who had to answer 90 percent of incoming calls within 30 seconds. Its performance was achieved by a systematic programme of numerous so-called “test calls” undertaken by its own employees, all of which were answered within the time .
I should emphasise that the internal use of information for the purposes of managing the organisation does not create any significant risk of such perverse reactions. It is the use of this information for purposes of evaluation or allocation of resources or remuneration that creates the likelihood of distortion.
Of course the problem of perverse reaction does not apply if what is being measured really is important and maximising performance of that indicator does not involve compromise of other values. In the public sector that is actually a rare combination. I have no doubt it does exist. It is not, however, true of the administration of justice. (I pause to note that “maximise” was one of many words invented by Bentham.)
As the defects of performance indicators becomes obvious, the pantometrists almost always respond by multiplying the number of indicators, seeking to block the latest distortion. As a result – most noticeably in the UK and New Zealand – ever increasing resources are devoted to compiling and publishing statistics which the law of diminishing returns has long since rendered pointless. I believe that in the not to distant future, contemporary public management will be treated with the same bemused contempt that is accorded to the Benthamites and the Soviet planners.
Just in case, contrary to the proposals of which I am aware, there is any suggestion that appeal rates is being taken seriously as a quality indicator by anyone, I will make some brief observations.
Over a period of time, the fact that a particular court or even a particular judge is frequently overturned on appeal may indicate something. That, however, is only true at the extremes and, in no case, is what happened in a single year an indicator of anything. The idea that some form of aggregation can produce a number that indicates quality, except at the extremes, can only be advanced by someone who is ignorant of the judicial process.
Appeals are allowed for a wide range of reasons which have nothing to do with the quality of the decision. Appeals are dismissed in a wide range of cases, often in the exercise of an appellate discretion, which do not constitute any kind of endorsement. The assumption that in some way these variations will come out in the statistical wash, can only be held by someone who believes in pantometry.
To serve any kind of indicator of quality, appeal rates have to be expressed by reference to the total number of cases from which appeals could be brought, not by reference to the number of appeals or, even, by reference to applications for leave to appeal. However, the number of appeals is actually quite small – rarely exceeding five percent of cases decided. The so-called indicator would actually look quite trivial. For example, only about two to three percent of New South Wales intermediate appeal court cases go to the High Court. What does anyone do with a number such as: “Successful appeals from New South Wales to the High Court doubled last year from one to two percent”?
More significantly, there is in this, as in all such cases, a real risk that over time the process will be distorted by the limited factor capable of measurement. Judges in trial courts or intermediate courts of appeal should make judgments and exercise discretions uninfluenced by what a court of appeal might think. Appellate courts should do the same without thinking about how the court or the judge from which an appeal is brought might be faring in the tables.
It now appears that the focus of attention in terms of the development of a quality indicator, to fill the apparently embarrassingly empty box in the template, is “client satisfaction”. However, questions of “satisfaction” are only of significance once one has determined the most fundamental question: satisfaction by whom and about what.
It may be useful for some purposes to develop some kind of index of “satisfaction”. That is not the case if the questioning involves aspects of the judicial process, where the principal focus must be on fidelity to the law, the fairness of the outcomes and the fairness of the procedures.
Over the course of many centuries we have established a series of institutional arrangements that are designed to ensure that the administration of justice is not conducted with a view to popularity. Any focus on “satisfaction” with respect to judicial decisions is inconsistent with the principle that judges should not be concerned with popularity.
The courts must, and do, collect information about the experience of persons involved in the administration of justice. The courts receive such information in various ways, e.g. user groups and other forms of communication, albeit primarily from the legal profession. Some general surveys acquire useful information. Often, the most useful information is of a qualitative character expressed in narrative terms, not capable of being reduced to a number.
What is proposed by way of a “quality indicator”, as I understand it, is some sort of survey, probably of the tick a box kind, on the basis of which a number can be generated. I doubt that such numbers will be of more than marginal use. Limited to matters of court administration of the kind I have described – signage, facilities, etc. – they may do some good and will do no harm. Beyond that, they can do harm.
Frankly, there are times where the ‘one size fits all’ template that requires all component parts in the public sector to report in the same way with quality and quantity indicators becomes a joke. To give only one example, the Department of Foreign Affairs and Trade, because it is required to have a quality indicator, identifies matters such as the following: “Satisfaction of portfolio Ministers with the Department’s policy advice …” and “Satisfaction of portfolio Ministers with the protection and advancement of Australia’s national interests …”. This Department has a quite narrow perspective of who – to use the argot of managerialism – its “stakeholders” are. In any event the reports do not pretend to provide a statistical measurement of their ministers’ state of “satisfaction”.
Beyond the matters I have identified as involving court administration, it is difficult to identity any purpose, other than the fulfilment of an ideological programme, that is served by attempting to measure perceptions about court performance. In many areas of the public sector the quality of the performance is capable of measurement. It is possible to measure mortality rates or rates of recovery from medical procedures. It is possible to measure, in a reasonably objective way, the results of education by way of examination. Whether or not measurements of this qualitative character ought to determine allocation of resources, remuneration and matters of that character is a different question. Nevertheless, something meaningful about the quality of the performance of certain sections of the public sector can be measured, albeit in a partial way. That is not true of the law.
The outcomes of judicial decision-making process can be variously stated. The administration of justice in accordance with law is one way. The attainment of a fair result arrived at by fair procedures is another. Such outcomes are not measurable. They can only be judged. There is no proxy indicator of the quality of these matters. “Client satisfaction” has nothing useful or interesting to say about them.
I should make it clear that the judicial process does not involve only the final decision in the form of a judgment. The whole of the process, including preparation for trial and case management is part of the judicial process. It is by no means clear to me that the current proposal for a quality indicator with respect to client satisfaction about “court administration” is so limited. If it is not, it is fundamentally unacceptable. If it is, then I doubt it will add much to what courts already do.
In any event, it is wrong to call “consumer satisfaction” a “quality indicator”. It would be seriously misleading to pretend that any such survey could ever measure the “quality” of a court’s conduct with respect to the important functions performed by courts. At best it relates to tangential matters. To pretend that this fills an empty box for a quality indicator in a template, applicable to all sectors of public management, is self-deception.
In all important respects, the quality of the administration of justice is not capable of being reduced to numbers. In particular, some kind of index of “client satisfaction”, does not measure quality.
The state of “satisfaction” will often bear no relationship of any character, let alone a direct linear relationship, to the actual quality of the relevant decision-making.
Obviously about half of all litigants go away dissatisfied with the outcome and that dissatisfaction also often impinges on their state of satisfaction with the process. This inbuilt bias is not only amongst parties. It is a bias, more often than not, also manifest by the legal practitioners.
There are many cases in which both parties go away dissatisfied. Fidelity to the law sometimes requires this. Fair procedures must be fair to both sides. Fairness to one side may be regarded by another as unfairness to them, or at least as contrary to their interests. Fair outcomes arrived at by fair procedures are not necessarily perceived to be fair in either respect by the litigants or, in many cases, by their legal representatives. That is because the point of the process is not their satisfaction or their perception of fairness.
Judicial decisions must be determined by objective standards. The satisfaction of the participants is not one of the purposes to be served. It is not an objective of courts to be popular. Courts must maintain public confidence in their integrity. However, that has nothing to do with “satisfying” persons as consumers of a “service”.
States of satisfaction or dissatisfaction are not naturally expressed in quantitative terms. In order to translate a narrative into quantitative terms some kind of artificial categorisation or ranking, of a tick a box character, has to be imposed by the designer of the survey. How that is done can influence and may determine the results.
Opinion surveys about quality are based on perceptions. Such surveys are notoriously unreliable, particularly on matters about which the interviews have limited understanding.
There are countless studies which show the wide range of distortion that arises from surveys of opinion. The results are often determined by the nature of the form and the precise wording of the question. Perhaps more significantly, perceptions are often systematically inaccurate: people believe they witnessed events when they were not there or which did not happen at all.
One study found that 44 percent of persons claimed to have seen a non-existent film of the car crash in which Princess Dianna died. Indeed, they were able to provide details about the event. Similarly, 55 percent of people claimed to have seen a television broadcast of a non-existent film of an air crash . The litigation process, as well as the survey process, contains numerous occasions for misdirecting the suggestibility of participants.
The courts deal with the plasticity of memory on a daily basis. The collective experience of the judiciary is that perception often diverges from reality and that is so not only for querulous litigants who appear in person.
To give only one example, a United States study of student evaluations of university lecturers showed that the results were determined in significant measure by how good looking the lecturer was. The “quality” of the lecturer was determined in this way, with real effects on promotion and remuneration . Other studies show that attractive witnesses in court are regarded as being more credible .
Surveys of opinions directed to issues of quality depend on the understanding, knowledge, experience and capacity of the persons expressing those opinions. In order to assess the value of the opinions it is necessary to know this. That never happens.
Often surveys are little more than surveys of reputation. Reputation is not necessarily related in any direct, or even rational way, to the matter sought to be assessed in a qualitative manner. They are an unreliable way of assessing quality.
Such surveys may give the appearance of being democratic. It may be unfashionable to say so, but quality is by its nature not susceptible to democratic assessment. Quality is hard to judge. It requires knowledge and experience. In the case of the judicial process, surveys will be of no value at all. However, my real concern is that the effects of such a process may be to pervert the decision making process it proposes to improve.
To the degree to which a so-called “quality indicator” is generated and acquires a level of public prominence – as has occurred for example with league tables for schools in England – courts will be given significant incentives to target the indicator in the perverse manner I have suggested is endemic to such a system. If courts come to be judged by the degree of satisfaction of the persons surveyed, then they may do what they can to increase the level of “satisfaction” about which questions happen to be asked.
Performance indicators are always partial and are always manipulable. The persons who administer the measurement system always have superior private information about how their own actions influence the measured results, than do the persons to whom the results are reported. Strategies of targeting the indicator, rather than doing the job properly, are always capable of being adopted. Doing so rarely has adverse consequences for those responsible because of the difficulty of auditing the distortions which occur. Indeed, the audit process itself often focuses on exactly the same performance indicators. The objectives that have been distorted are often either not measurable or not measured.
So it could be with “client satisfaction” if it extends beyond matters of administration. It is the only thing that looks like a quality indicator that anyone can think of. However, it does not measure anything that is of real importance about the quality of court performance. There is no necessary or direct relationship between perceptions about the quality of judicial process and that quality assessed on any objective standard.
The Report on Government Services sets out comparative tables giving the same figures for all States and Territories and Commonwealth institutions. This is a critical aspect of the exercise. Such “benchmarking” is adopted to create some kind of rivalry, by a process of invidious comparison which, in the absence of any kind of market, is thought to create incentives to improve performance. Nothing remotely like that has happened as a result of the Productivity Commission reports. Nor is any such effect likely.
Courts, including my own court, have developed their own set of statistics for purposes of internal administration. The principal advantage of the Productivity Commission processes has been to refine the statistics that are compiled for our internal decision-making processes. The work undertaken for the Report on Government Services has improved our own internal management. Comparisons with other States are irrelevant for that purpose. Nor, in my opinion, do they produce anything that is actually useful for purposes of accountability.
Every year, at the time of publication of the Productivity Commission report, there is a spate of articles in the media based on the tables comparing the different Supreme Courts or the different District Courts on the published indicators of backlog, clearance rates, cost for finalisation and the like. Even a cursory glance at the footnotes and qualifications to these tables would make it perfectly clear that no valid comparison of any character can be drawn. Nevertheless, the Report continues to publish tables setting out the information as if they were comparable. If such a table were published in trade or commerce, it would be injuncted for being in breach of the Trade Practices Act’s prohibition of false and misleading conduct.
The case mix of different Supreme Courts is completely different. The Supreme Court of New South Wales has a caseload which is, in general terms, broadly comparable with that of the Supreme Court of Victoria. There is, however, no similarity between our caseload and that of any other Supreme Court or Federal Court. For example, a substantial portion of the criminal jurisdiction of the Supreme Court of Queensland involves minor offences of a character which, in New South Wales, are dealt with in the Local Court.
Nevertheless, we have to tolerate an annual burst of publicity, which purports to compare performance between jurisdictions on issues of delay or cost. To continue to publish tables which invite such comparisons, despite the knowledge that the qualifications contained in notes to the tables will be ignored, is not particularly responsible. I expect the same will occur if comparisons can be made about “client satisfaction”.
I can see little purpose to be served by way of benchmarking with respect to “client satisfaction”. As long as the surveys are limited to matters of court administration, as I have defined it, they can do little harm, other than the inevitable media attempt to make comparisons. We have to put up with much worse.
The ubiquity of opinion polls has introduced fundamental distortions to the political process. I for one will resist any attempt to degrade the judicial process in the same way.
1. The published versions of my addresses are: “The Qualitative Dimension of Judicial Administration” (1999) 4 The Judicial Review 179; “Seen to be Done: The Principle of Open Justice” (2000) 74 Australian LJ 290, 378; “Economic Rationalism and the Law” (2001) University of NSW LJ 200; “Citizens, Consumers and Courts” (2001) 60 Australian Journal of Public Administration 5; “Judicial Accountability and Performance Indicators” (2002) Civil Justice Quarterly 18; “The ‘New Public Management’ and the Courts” (2001) ALJ 748; “Quality in an Age of Measurement” Quadrant (March 2002) p9; “The Maintenance of Institutional Values” (2002) 33 LASIE 91. All of these addresses are also accessible at the New South Wales Supreme Court website http://www.lawlink.nsw.gov.au/sc.
2. See Robert Birnbaum Management Fads in Higher Education: Where They Came From, What They Do, Why They Fail Jossey-Bass, San Francisco, 2000.
3. See references in my above article 75 ALJ at 757.
4. See Wilson Quarterly Summer 2003 p163 reporting on The New York Times Magazine Mar 16, 2003.
5. See Dalrymple “The Barbarians at the Gates of Paris” City Journal Autumn 2002, accessible at www.city-journal.org.
6. Wiggins & Tymms “Dysfunctional Effects of League Tables: A Comparison Between English and Scottish Primary Schools” (2002) 22 Public Money and Management p43.
7. London Review of Books, 18 December 2003 p 10, review of David Ramsbotham Prisongate Free Press 2003.
8. Australian Financial Review, 16 May 2003, referring to articles in The Observer.
9. Report of the Metropolitan Ambulance Service Royal Commission Vol 1, Parliament of Victoria, May 2001 at par [5.5.2].
10. See R McNally Remembering Trauma Harvard Uni. P. (2003) at 68-69.
11. See Hamermesh and Parker “Beauty in the Classroom: Instructors Pulchritude and Putative Pedagogical Productivity” Economics of Education Review accessible at http://papers.nber.org/papers/w9853.
 See C Fife-Shaw “The Influence of Witness Appearance and Demeanour on Witness Credibility” (1995) 35 Med Sc Law 127.