Data Classification is the process of sorting or arranging groups or categories on a map, and also the process of representing members of a group by the same symbol ( usually defined in a legend). It is used in GIS, cartography, and remote sensing to generalize complexity in, and extract the meaning from geographic phenomena and geospatial data.

Methods of Data Classification:

There are four primary methods for data classification,

  • Equal interval
  • Quantile
  • Standard deviation, and
  • Natural breaks

Equal Interval:

  • The equal interval method is self-implied. It breaks the data into classes of equally distributed intervals.
  • Equal intervals offer an unbiased yet simple way to break down represented data. It shows different groups when they are close in size.
  • This method emphasizes the amount of an attribute value relative to other values.

For instance, the town population data can be broken into equal intervals of 1000 people, such as class 1 equals 0 to 1000, class 2 equals 1001 to 2000, and so on.

Quantile Method:

  • The quantile method divides data into classes having equal amounts of features. For example, if a thematic map had 100 attribute features, we could use five data classes, with each representing 20 features.
  • A quantile classification is well suited to linearly distributed data. As quantile assigns the same number of data values to each class, there are no empty classes or classes with too few or too many values.
  • Data are ordered by rank in the classes, and as such, the quantile method proves excellent for maps with ranked values.
  • However, one potential disadvantage to the quantile classification is that it distorts the natural distribution of attribute data and may skew the overall attribute data analysis.

Unlike other data classification methods, there is a defined nomenclature associated with quantile maps. For instance, maps with three classes are called tertile, maps with four classes are called quartile, maps with five classes are called quintile. Additionally, due to the same number of features per classes, quantile classification often produce visually pleasant maps.

Standard deviation:

  • Standard deviation classifications create classes that represent standard deviation from the average attribute value.
  • Typically, the standard deviation methods requires an even number of data classes so that there are an equal number of data classes above and below the mean attribute value.
  • This data classification offers simple comparisons between above-mean and below-mean data.
  • One disadvantage of this classification is that a level of statistical understanding is needed to fully interpret the resultant map.

Natural Breaks:

  • The natural breaks method recognizes that the breaks in distribution may be directly related to the characteristics of the theme being mapped.
  • GIS mapping software is able to search for these natural breaks in the charts or histograms of the data using a statistical equation called the Jenks method.
  • The natural breaks classification maintains uniformity throughout the entire thematic map and avoids any obviously saturated data classes.

For example, a choropleth map with 49 out of the 50 United States in one color offers no real analytic advantage. This method of data classification would find minute differentiations in the data and adjust the data classes accordingly, thus producing a more useful analytical map.