This course allows educators to engage with contemporary literature on effective evaluation and connect it to their own practice.
Mode of delivery: online
Accredited hours: 2
myPL course code: RG03000
Learn more about the 5 essentials for effective evaluation publication.
A program of professional development designed to consolidate and strengthen evaluative thinking and practice in your school. Read more about this course on the Evaluation Resource Hub.
Mode of delivery: face-to-face
Accredited hours: 6
myPL course code: RG01452
Themes: evaluation, leadership
These reports are part of an ongoing evaluation of Great Teaching, Inspired Learning (GTIL). GTIL is the NSW government's plan to improve the quality of teaching in NSW schools. Learn more about GTIL.
CESE evaluated key reforms under GTIL relating to:
• school leadership initiatives
• cadet and internship programs
• professional experience.
School leadership initiatives
CESE evaluated three key reforms under GTIL that aim to support leadership development among existing, new and aspiring leaders. The reforms evaluated were the:
• NSW Public School Leadership and Management Credential (action 15.3)
• Leadership Development Initiative (actions 14.1 and 14.2)
• Principal, School Leadership Initiative (action 15.2).
Cadetship and internship programs
CESE conducted an evaluation of two GTIL actions designed to attract high achieving students into the teaching profession in areas of workforce need. The department introduced the Cadetship and Internship Programs in 2014 to address these actions.
Cadets and interns are employed on a part-time basis during their teacher education studies to provide support to classroom teachers. They are guaranteed a permanent teaching position in a NSW public school upon completion of their studies.
CESE conducted an evaluation of the key GTIL actions designed to improve the quality of professional experience placements for pre-service teachers. The report presents the findings in relation to the implementation and early impacts of:
• Closer matching of supply and demand for graduate teachers through the introduction of Professional Experience Agreements (action 4.2)
• Establishment of specialist professional experience schools (action 4.3)
• Professional learning for professional experience supervisors (action 4.4)
• Highly Accomplished and Lead Teachers leading professional experience activities (action 4.5).
A final GTIL evaluation report is due in 2019.
In 2012, the NSW Department of Education launched the Local Schools, Local Decisions (LSLD) education reform. LSLD aims to give NSW government schools more authority to make local decisions about how best to meet the needs of their students. LSLD focuses on five interrelated reform areas: making decisions, managing resources, staffing schools, working locally and reducing red tape. A cornerstone element of LSLD is the introduction of a new needs-based approach to school funding through the Resource Allocation Model (RAM).
CESE is conducting an evaluation of LSLD. The evaluation began in mid-2016 and will conclude in mid-2020. The evaluation includes a process evaluation that investigates the implementation of LSLD, and an outcome evaluation focussing on the impact of the reform on school and student outcomes.
This LSLD interim evaluation report presents interim findings on three key evaluation questions:
1. How have schools spent their RAM equity loadings?
In 2016, schools spent their RAM equity loadings on four main spending categories: employing key staff, enhancing learning support, planning and developing programs, and building staff capacity.
2. What has been the impact of LSLD on school management and local decision-making practices?
In four of the five LSLD reform areas, principals perceive the impact of LSLD to have been positive. In the fifth reform area, reducing red tape, more than two-thirds of principals
said that LSLD has not had a positive impact on simplifying administrative processes.
3. What has been the impact of LSLD and RAM funding on school and student outcomes?
The five student engagement measures included in this report (attendance, suspension, social engagement, institutional engagement and aspirations to complete Year 12) showed only very small to small overall changes over time. In terms of differential change over time, we found no relationship between changes over time in these engagement measures and levels of need, with the notable exception that students in higher-need schools typically showed less positive change over time in levels of social engagement than students in lower-need schools. On these findings alone, there is not yet evidence to support the idea that higher-need schools benefit more from the RAM equity loadings than lower-need schools.
A final evaluation report will be published by CESE in mid-2020. This report will include an analysis of educational outcomes, including in-depth statistical modelling of NAPLAN results from 2012 to 2018, which will help us better understand the longer term effects of the reform.
Thinking evaluatively is important for identifying what works and encouraging continuous improvement. This audio paper sets out five conditions for effective evaluation in education, giving practical advice for both educators and policy makers.
Read by Rydr Tracy, CESE.
Go to the Effective evaluation report.
The Literacy and Numeracy Action Plan 2012-2016 was developed to address the widespread inequalities in learning outcomes known to exist from the earliest years of schooling in NSW schools serving low socio-economic status communities. This report presents the findings of an evaluation of NSW Literacy and Numeracy Action Plan 2012-2016. It examines the extent to which student literacy and numeracy improved, factors that may have led to any improvement, and the extent to which any improvement achieved was cost-effective.
Authors: Evalynn Mazurski, James Finn, Andrew Goodall, Wai-Yin Wan
Evaluator company/business: Centre for Education Statistics and Evaluation, NSW Department of Education
URL or Pdf: Download the Rural and Remote Education Blueprint - interim monitoring and evaluation report (PDF, 3MB)
Summary: The evaluation considered the implementation, impact and changes in indicators of student engagement and performance, and the quality of teaching. There is evidence that some of the actions are already achieving their objectives, however, it is important to note that any observed outcomes for rural and remote children and young people may be due, at least partially, to other reforms and initiatives being concurrently delivered. Similarly, where desired outcomes for rural and remote children and young people have not been observed, the failure cannot be solely attributed to the blueprint.
All education programs are well-intentioned and many of them are highly effective. However, there are usually more ways than one to achieve good educational outcomes for students. When faced with this scenario, how do educators and education policymakers decide which alternative is likely to provide most ‘bang for buck’?
There’s also an uncomfortable truth that educators and policymakers need to grapple with: some programs are not effective and some may even be harmful. What is the best way to identify these programs so that they can be remediated or stopped altogether?
Program evaluation is a tool to inform these decisions. More formally, program evaluation is a systematic and objective process to make judgements about the merit or worth of our actions, usually in relation to their effectiveness, efficiency and appropriateness (NSW Government 2016). Evaluation and self-assessment is at the heart of strong education systems and evaluative thinking is a core competency of effective educational leadership. Teachers, school leaders and people in policy roles should all apply the principles of evaluation to their daily work.
Research shows that:
It may sound obvious but understanding whether program activities have been effective requires a clear understanding of what the program is trying to achieve. The objectives also need to be measureable.
For some programs or activities this is very easy. For example, reading interventions like Reading Recovery aim to improve students’ ability to read. In these instances it is easy to start with a clear statement of objectives (i.e. to improve students’ ability to read). It is also quite easy to measure outcomes because reading progression is relatively easy to measure (although the issue of causal attribution is important – more on that later).
However, for some programs, it can be more difficult to develop a clear statement of objectives and it is even more difficult to measure whether they have been achieved. Take the Bring Your Own Device (BYOD) policy as an example. The objective of BYOD is often described as using technology to ‘deepen learning’, ‘foster creativity’ or ‘engage students’. These are worthy objectives. The challenge for schools and systems is to work out whether they have been achieved. What does ‘deep learning’ look like and how can it be measured? How will teachers know if a student is more ‘creative’ or ‘engaged’ now than they were before? How much of that gain is due to the program or policy (BYOD) and how much is due to other factors?
Figure 1 provides some examples of common objectives and possible measures that will inform whether they have been achieved. These are highly idealised examples and the problems that educators are trying to solve are usually more multi-faceted and complex than these. In some cases it may not even be possible to robustly measure outcomes. In other cases, there may be more than one outcome resulting from a set of activities. However, no matter how hard and complex the problem, if there is no clarity about what the problem is, there is also no chance of measuring whether it has been solved.
Effective programs have a clear line of sight between the needs they are responding to, the resources available, the activities undertaken with those resources, and how activities will deliver outcomes. Logic modelling is one way to put these components on a piece of paper. Wherever possible, this should be done by those who are developing and implementing a program or policy, in conjunction with an experienced evaluator. At its most simple, a logic model looks like that shown in Figure 2.
The needs are about the problem at hand and why it is important to solve it. Inputs are the things put in to address the need (usually a combination of money, time and resources). Activities describe the things that happen with the inputs. Outcomes are usually expressed in terms as measures of success. A logic model is not dissimilar to the processes used in school planning. Needs are usually the strategic priorities identified in the plan. Inputs are the resources allocated to address those needs. Activities are often referred to as processes or projects. Outcomes and impacts are used interchangeably. Figure 3 gives some common examples of needs, inputs, activities and outcomes.
Some of these examples are ‘add-on’ activities to business-as-usual (e.g. speech pathology) and some simply reflect the way good teachers organise their classroom (e.g. differentiated instruction). Figure 3 merely serves to illustrate that the evaluative process involves thinking about the resources going into education, how those inputs are organised and how they might plausibly lead to change.
Good evaluation will make an assessment of how well the activities have been implemented (process evaluation) and whether these activities made a difference (outcome evaluation). If programs are effective, it might also be prudent to ask whether they provide value for money (economic evaluation).
A simple logic modelling worksheet can be found in the Appendix.
Process evaluation is particularly helpful where programs fail to achieve their goals. It helps to explain whether that occurred because of a failure of implementation, a design flaw in the program, or because of some external barrier in the operating environment. Process evaluation also helps to build an understanding of the mechanisms at play in successful programs so that they can be replicated and built upon.
Outcome evaluation usually identifies average effects: were the recipients better off under this program than they would have been in its absence. However, when viewed in combination with process evaluation, it can provide a more nuanced overview of the program. It can explore who the program had an impact on, to what extent, in what ways, and under what circumstances. This is important because very few programs work for everyone. Identifying people who are not responding to the program helps to target alternative courses of action.
Economic evaluations help us choose between alternatives when we have many known ways of achieving the same outcomes. In these circumstances, the choice often comes down to what is the most effective use of limited resources. If programs are demonstrably ineffective, there is little sense in conducting economic evaluations. Ineffective programs do not provide value for money.
While repeating a school year is relatively uncommon in NSW, it is quite common in some countries such as the United States. It is a practice that has considerable intuitive appeal – if a student is falling behind (need) the theory is that an additional year of education (input) will afford them the additional instruction (activity) required to achieve positive educational outcomes (outcome). Evidence suggests that this is true only for a small proportion of students who are held back. In fact, after one year, students who are held back are on average four months further behind similar-aged peers than they would have been had they not been held back.
According to research conducted by the UK Education Endowment Foundation, the reason that repeating a year is not effective is that it “just provides ‘more of the same’, in contrast to other strategies which provide additional targeted support or involve a new pedagogical approach. In addition, it appears that repeating a year is likely to have a negative impact on the student’s self-confidence and belief that they can be an effective learner”. In other words, for most recipients of the program the activities are poorly suited to the students’ needs. In situations like this, well-intentioned activities can actually have a negative impact on a majority of students.
Once a clear problem statement has been developed, the inputs and activities are identified, and intended outcomes have been established, coherent evaluation questions can be developed.
Good evaluation will ask questions such as:
All too often educational researchers get hung up on using ‘qualitative’ versus ‘quantitative’ methods when answering these questions. This is a false dichotomy. The method employed to answer the research question depends critically on the question itself.
Qualitative research usually refers to semi-structured techniques such as in-depth interviews, focus groups or case studies. Quantitative research usually refers to more structured approaches to data collection and analysis where the intention is to make statements about a population derived from a sample.
Both approaches will have merit depending on the evaluation question. In-depth interviews and focus groups are often the best ways of understanding whether a program has been implemented as intended and, if not, why not. These methods have limitations when trying to work out impact because, by definition, information is only gleaned from the people who were interviewed. Unless something is known about the people who weren’t interviewed, these sorts of methods can be highly misleading. For example, people who didn’t respond well to the intervention might also be less likely to participate in interviews or focus groups. This is where quantitative methods are more appropriate because they can generalise to describe overall effects across all individuals. However, combining both qualitative and quantitative methods can be useful for identifying for whom and under what conditions the program will be effective. For example, CESE researchers investigating the practices of high-growth NSW schools used quantitative analysis to identify high-growth schools and analyse survey results, and qualitative interviews to find out more about the practices these schools implemented.
The possible sources of data to inform evaluation questions are endless. The key issue is to think about the evaluation question and adopt the data and methods that will provide the most robust answer to that question.
The number one question that most evaluations should set out to answer is: did the program achieve what it set out to achieve? This raises the vexing problem of how to attribute activities to any observed outcomes.
No single evaluation approach will give a certain answer to the attribution question. However, some research designs will allow for more certain conclusions that the effects are real and are linked to the program. CESE uses a simple three-level hierarchy to classify the evidence strength, as shown in Figure 4. There are many variations on this hierarchy, most of which can be found in the health and medical literature.
Taking before (pre) and after (post) measures is a good start and is often the only way to measure outcomes. However, simple comparisons like this need to be treated cautiously because some outcomes will change over time without any special intervention by schools. For example, if a student’s reading level was measured at two time points, they would usually be at a higher level at the second time point just through the course of normal class and home reading practice.
This is where reference to benchmarks or comparison groups is critical. For example, if the typical growth in reading achievement over a specified period of time is known, it can be used to benchmark students against that expected growth. Statements can then be made about whether growth is higher or lower than expected as a result of program activities.
An even stronger design is when students (or schools, or whatever the target group is comprised of) are matched like-for-like with a comparison group. This design is more likely to ensure that differences are due to the program and not due to some other factor or set of factors. These designs are referred to as 'quasi-experiments' in Figure 4.
Even better are randomised controlled trials (RCTs) where participants are randomly allocated to different conditions. Outcomes are then observed for the different groups and any differences are attributed to the experience they received relative to their peers. RCTs can also be conducted using a wait-list approach where everyone gets the program either immediately or after a waiting period. RCTs allow for strong causal attributions because the random assignment effectively balances the groups on all of the factors that could have influenced those outcomes.
RCTs have a place in educational research but they will probably always be the exception rather than the rule. RCTs are usually reserved for large-scale projects and wouldn't normally be used to measure programs operating at the classroom level. Special skills are required to run these sorts of trials and most of the programs run by education systems would be unsuited to this research design. In the absence of RCTs, it is still important to think about ways to measure what the world looked like before the activity began and what it looked like after some period of activity has been undertaken. This requires taking baseline and follow-up measures and comparing these over time.
As a rule, the less rigorous the evaluation methodology, the more likely we are to falsely conclude that a program has been effective. This suggests that stronger research designs are required to truly understand what works, for whom and under what circumstances.
In all of the above, it is crucial for educators to be open-minded about what the results of the evaluation might show and be prepared to act either way. Evaluation should not be a tool for justifying or ‘evidence washing’ a predetermined conclusion or course of action. The reason for engaging in evaluation is to understand program impact in the face of uncertainty. It provides the facts (as best they can be estimated) to help make decisions about how to structure programs, whether they should be expanded, whether they need to be adjusted along the way, or whether they need to stop altogether.
Evaluation not only asks ‘what is so?’ – it also asks ‘so what?’ In other words, evaluation is most useful if it will lead to meaningful change. Before embarking on any evaluation, it is important to think about what can reasonably be achieved from the research. If continuation of the program is not in question, it may be better focusing on process questions bearing on program efficiency or quality improvement. It is also important to think about stakeholders, how they might react to the evaluation and what needs to happen to keep them informed along the way.
In accordance with the NSW Government Program Evaluation Guidelines (NSW Government 2016), evaluation should be conducted independently of program delivery and it should be publicly available for transparency. Independence might not always be possible where no budget exists or where activity is business-as-usual or small in scale (e.g. classroom-level or school-level programs). Evaluative thinking is still critical in these circumstances as part of ongoing quality improvement.
Where a formal evaluation has been conducted, transparency is a critical part of the process. Stakeholders need to understand the questions the evaluation sought to answer, the methods employed to answer them, any assumptions that were made, what the evaluation found and the consequences of those findings. Transparency also helps people in later times or in other schools or jurisdictions to identify what works.
To embed the sort of evaluative thinking described above into activity across education requires everyone to be evaluative thinkers in one way or another. Everyone designing or implementing a program needs to be clear on what problem they are trying to solve, how they are planning to solve it and how success will be measured.
For smaller, more routine programs and policies, performance should be monitored using the sort of benchmarking described above to determine the effectiveness, efficiency
and appropriateness of expenditure. This could be done by an early childhood service Director, by a school teacher, by a principal, school leadership group, Directors Public Schools or Principals School Leadership. If more technical assistance is required, it may be better to bring in that technical expertise.
Centre for Education Statistics and Evaluation 2015, ‘Six effective practices in high growth schools’, Learning Curve Issue 8, Centre for Education Statistics and Evaluation, Sydney.
NSW Government 2016, ‘NSW Government Program Evaluation Guidelines', Department of Premier and Cabinet, NSW Government, Sydney.
OECD 2013, ‘Synergies for better learning: An international perspective on evaluation and assessment’, OECD Publishing, Paris.
Robinson, V, Lloyd, C & Rowe, K 2008, ‘The impact of leadership on student outcomes: An analysis of the differential effects of leadership types’, Educational Administration Quarterly, vol. 44, no. 5, pp. 635-674.
Timperley, H & Parr, J 2009, ‘Chain of Influence from policy to practice in the New Zealand literacy strategy’, Research Papers in Education, vol.24, no.2, pp.135-154.
This evaluation (PDF, 1.8MB) examined the impact of Reading Recovery (RR) on students' outcomes in NSW government schools. The evaluation found some evidence that RR has a modest short-term effect on reading skills among the lowest performing students. However, RR does not appear to be an effective intervention for students that begin Year 1 with more proficient literacy skills. In the longer-term, there was no evidence of any positive effects of RR on students' reading performance in Year 3.
Related: Learning Curve 11 - Reading Recovery
Reading Recovery: A sector-wide analysis (PDF, 1MB) briefly describes the results of an evaluation examining the impact of Reading Recovery on students' outcomes in NSW government schools. You can also read the Reading Recovery evaluation.