All state-funded mainstream schools that meet all of the following criteria are included in *Schools Like Yours*:

- based on an August 2018 extract downloaded from Get Information About Schools:
- with a lower age limit below 11;
- with an upper age limit of 9 or higher;
- classified as open or open, but proposed to close;
- included in the ‘Schools, pupils and their characteristics’ dataset, January 2018

All state-funded special schools that meet all of the following criteria are included in *Schools Like Yours:*

- based on an August 2018 extract downloaded from Get Information About Schools:
- with a lower age limit below 16;
- classified as open or open, but proposed to close;
- included in the ‘Schools, pupils and their characteristics’ dataset, January 2018

Schools are generally identified by a three-digit LA identifier and a four-digit school identifier. The three-digit LA codes change when local authority boundaries are changed and the four-digit estab codes when the nature of a school changes (e.g. when an academy sponsor changes).

We use data on school links (successors and predecessors) available through Get Information About Schools to consistently identify a school over time, and do not include data on schools which amalgamate.

The site makes use of the following publicly available school-level datasets:

- Get Information About Schools, August 2018 extract
- Final Key Stage 2 results 2018
- Final Key Stage 2 results 2017
- Final Key Stage 2 results 2016
- Revised Key Stage 4 results 2018
- Final Key Stage 4 results 2017
- Final Key Stage 4 results 2016
- Schools, pupils and their characteristics, January 2018
- School capacity, 2017 to 2018
- Ofsted ratings as at 31 August 2018
- School workforce, November 2017
- Pupil absence, full year 2017 to 2018
- Maintained school consistent financial reporting, 2017
- Maintained school consistent financial reporting, 2016
- Maintained school consistent financial reporting, 2015
- Academies income and expenditure, 2017
- Academies income and expenditure, 2016
- Academies income and expenditure, 2015

Capacity figures are calculated in percentage terms, based on schools' recorded capacity and the number of pupils on roll. Figures below 100% mean a school has some spare capacity; figures above 100% mean a school is operating above capacity.

Ofsted publishes monthly management information containing the results of its inspections.

We have accumulated these datasets over a number of years and use them to identify the inspection judgment for overall effectiveness for each school (including any linked predecessor) on 31st August 2018.

A number of three-year average figures are included for KS2 measures:

*Percentage of pupils reaching the expected standard in reading, writing and maths, 2016-2018*: Where a published figure exists, this has been used. Where such a published figure does not exists, a weighted average has been produced using three years' individual published figures, or as many years as are available for a given school up to a maximum of three years.*Reading progress measure, 2016-2018*/*writing progress measure, 2016-2018*/*maths progress measure, 2016-2018*: A weighted average has been produced using three years' individual published figures, or as many years as are available for a given school up to a maximum of three years.

Data on income and expenditure for maintained schools and academies are reported separately.

Each data source contains a measure of total income and the number of full-time equivalent (FTE) pupils. Income is divided by FTE numbers to get an income per pupil number.

Income figures used are net of catering income and supply teacher insurance claims and where income figures cover a period less than 12 months, a 12-month figure is calculated on a pro-rataed basis.

For schools within federations, figures reported are federation-level figures, which are published under the lead school of each federation.

For schools within multi-academy trusts, central services income is allocated in proportion to FTE pupil numbers and the number of months that schools have been with the trust within the relevant calendar year. Where FTE pupil numbers are not available for a given school, no central services income is allocated to the school.

*Schools Like Yours* uses a calculated "difference" to determine similarity between pairs of schools. The exact values, which can be displayed in the difference column, have no meaning on their own, but can be used for comparison. Zero means "the same", according to the calculation; the greater the difference beyond that, the greater the dissimilarity from the selected school.

Let's imagine there are two schools, Alpha School and Beta School, and we want to measure how similar they are in terms of prior attainment, the percentage of pupils who are disadvantaged, and the percentage of pupils with a first language other than English.

Alpha | Beta | Difference | |

Prior attainment | 24.3 | 29.3 | -5 |

Disadvantaged pupils (%) | 30 | 24 | +6 |

First language other than English (%) | 9 | 10 | -1 |

For the purpose of this example, we'll use the values without scaling. In this example, both schools have data for each of the three measures but in *Schools Like Yours*, we use an average value as an approximation (or proxy) when values are missing.

As described below, difference is calculated as the square root of the (weighted) sum of squared differences in each included measure, so the difference the schools' scores is calculated as *difference = √((-5) ^{2} + (+6)^{2} + (-1)^{2}) = √(25 + 36 + 1) = √62 = 7.874...*

We might want to give greater weight to the percentage of disadvantaged pupils when calculating similarity and so could increase its weight to 1.5. The difference then would be *difference = √((-5) ^{2} + (1.5 × (+6))^{2} + (-1)^{2}) = √(25 + 81 + 1) = √107 = 10.344...*

By repeating this calculation for every school with sufficient data, we can then find the schools in England that are most similar to Alpha School.

In mathematical language the difference might be described as an n-dimensional weighted Euclidean distance, calculated as the square root of the (weighted) sum of squared differences in each included measure. This is an abstract and extended form of the familiar trigonometry calculation for the length of the hypotenuse of a right-angled triangle: *c = √(a ^{2} + b^{2})*.

Once the difference is calculated, the records returned are simply those with the smallest values.

If all the measures are the same scale, this sort of calculation is straightforward, but a measure with a greater range of values would tend to dominate the calculation since the possible difference in that measure is greater. To combat this issue, the values are first rescaled so that the population standard deviation for each included measure is 1.

This applies only to the values used in the difference calculation, not the measure values displayed on the table.

In order to allow for prioritising certain measures in the calculation, we include a weight for each measure included in the calculation.

By default, all data items included in the similarity criteria are given a weight of 1, in other words they are considered equally important. However, these values can be changed by clicking on the pill within the similarity criteria box and entering a user-defined value in the weight box.

A value of 0.5 means that the selected measure only has half the importance of other measures, 3 means it is three times as important as normal. Notice that a result of this is that a weight of zero means that the measure is effectively ignored from the calculation since it has no importance at all.