arcgis.geoanalytics.summarize_data module

The Summarize Data module contains functions that calculate total counts, lengths, areas, and basic descriptive statistics of features and their attributes within areas or near other features.

aggregate_points calculates statistics about points that fall within specified areas or bins. join_features calculates statistics about features that share a spatial, temporal, or attribute relationship with other features. reconstruct_tracks calculates statistics about points or polygons that belong to the same track and reconstructs inputs into tracks. summarize_attributes calculates statistics about feature or tabular data that share attributes. summarize_within calculates statistics for area features and attributes that overlap each other.

aggregate_points

summarize_data.aggregate_points(bin_type: str = None, bin_size: float = None, bin_size_unit: str = None, polygon_layer=None, time_step_interval: int = None, time_step_interval_unit: str = None, time_step_repeat_interval: int = None, time_step_repeat_interval_unit: str = None, time_step_reference: datetime.datetime = None, summary_fields: str = None, output_name: str = None, gis=None)

Using a layer of point features and either a layer of area features or bins defined by a specified distance, this tool determines which points fall within each area or bin and calculates statistics about all the points within each area or bin. You may optionally apply time slicing with this tool.

For example

  • Given point locations of crime incidents, count the number of crimes per county or other administrative district.
  • Find the highest and lowest monthly revenues for franchise locations using 100 km bins.

This tool works with a layer of point features and a layer of areas features. Input area features can be from a polygon layer or they can be square or hexagonal bins calculated when the tool is run. The tool first determines which points fall within each specified area. After determining this point-in-area spatial relationship, statistics about all points in the area are calculated and assigned to the area. The most basic statistic is the count of the number of points within the area, but you can get other statistics as well.

For example, suppose you have point features of coffee shop locations and area features of counties, and you want to summarize coffee sales by county. Assuming the coffee shops have a TOTAL_SALES attribute, you can get the sum of all TOTAL_SALES within each county, the minimum or maximum TOTAL_SALES within each county, or other statistics such as the count, range, standard deviation, and variance.

This tool can also work with data that is time-enabled. If time is enabled on the input points, then the time slicing options are available. Time slicing allows you to calculate the point-in-area relationship while looking at a specific slice in time. For example, you could look at hourly intervals, which would result in outputs for each hour.

For an example with time, suppose you had point features of every transaction made at various coffee shop locations and no area layer. The data has been recorded over a year and each transaction has a location and a time stamp. Assuming each transaction has a TOTAL_SALES attribute, you can get the sum of all TOTAL_SALES within the space and time of interest. If these transactions are for a single city, we could generate areas that are 1-kilometer grids and look at weekly time slices to summarize the transactions in both time and space.

Argument Description
point_layer Required Input Points layer (features).
bin_type Optional string parameter. If polygon_layer is not defined, it is required. Choice list:[‘Square’, ‘Hexagon’]
bin_size Bin Size (float). Optional parameter.
bin_size_unit Bin Size Unit (str). Optional parameter. Choice list:[‘Feet’, ‘Yards’, ‘Miles’, ‘Meters’, ‘Kilometers’, ‘NauticalMiles’]
polygon_layer Optional Input Polygons layer (features). If bin_type and bin properties are not defined, it is required.
time_step_interval Time Step Interval (int). Optional parameter.
time_step_interval_unit Time Step Interval Unit (str). Optional parameter. Choice list:[‘Years’, ‘Months’, ‘Weeks’, ‘Days’, ‘Hours’, ‘Minutes’, ‘Seconds’, ‘Milliseconds’]
time_step_repeat_interval Time Step Repeat Interval (int). Optional parameter.
time_step_repeat_interval_unit Time Step Repeat Interval Unit (str). Optional parameter. Choice list:[‘Years’, ‘Months’, ‘Weeks’, ‘Days’, ‘Hours’, ‘Minutes’, ‘Seconds’, ‘Milliseconds’]
time_step_reference Time Step Reference (datetime). Optional parameter.
summary_fields

Summary Statistics (str/list). Optional parameter.

The summary_fields string must enclose a Python list. Each list item must be a Python dictionary with two keys. See the Key:Value definitions below.

See URL 1 below for full details.

output_name Output Features Name (str). Optional parameter.
gis Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

Key:Value Dictionary Options for Argument summary_fields

Key Value
statisticType

Required string. Indicates statistic to summarize. See URL 1 below for full explanation.

Choice list numeric fields:[‘Count’, ‘Sum’, ‘Mean’, ‘Min’, ‘Max’, ‘Range’, ‘Stddev’, ‘Var’]

Choice list for string fields:[‘Count’, ‘Any’]

onStatisticField

Required string. Provides the field name to summarize.

See https://developers.arcgis.com/python/guide/working-with-feature-layers-and-features/#Querying-feature-layers for instructions to query a feature layer for field names.

For detailed explanation see:

URL 1: http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Aggregate_Points/02r3000002rr000000/

Returns: Output Features as Item

Example

# Usage Example: Using summary_fields on a layer.

agg_pts_item = aggregate_points(input_points_layer,
                  bin_size=0.5,
                  bin_type='Hexagon',
                  bin_size_unit='Miles',
                  summary_fields=[{"statisticType": "Count", "onStatisticField": "fieldName1"}, {"statisticType": "Any", "onStatisticField": "fieldName2"}]
                  )

build_multivariable_grid

summarize_data.build_multivariable_grid(variable_calculations, bin_size, bin_unit='Meters', bin_type='Square', output_name=None, gis=None)

Only available at ArcGIS Enterprise 10.6.1 and later.

The Build Multi-Variable Grid task works with one or more layers of point, line, or polygon features. The task generates a grid of square or hexagonal bins and compiles information about each input layer into each bin. For each input layer, this information can include the following variables:

  • Distance to Nearest - The distance from each bin to the nearest feature.
  • Attribute of Nearest - An attribute value of the feature nearest to each bin.
  • Attribute Summary of Related - A statistical summary of all features within search distance of each bin.

Only variables you specify in variable_calculations will be included in the result layer. These variables can help you understand the proximity of your data throughout the extent of your analysis. The results can help you answer questions such as the following:

  • Given multiple layers of public transportation infrastructure, where in the city is least accessible by public transportation?
  • Given layers of lakes and rivers, what is the name of the water body closest to each location in the US?
  • Given a layer of household income, where in the US is the variation of income in the surrounding 50 miles the largest?

The result of Build Multi-Variable Grid can also be used in prediction and classification workflows. The task allows you to calculate and compile information from many different data sources into a single, spatially continuous layer in one step, reducing the amount of effort required to build prediction and classification models.

Arguments Description
input_layers

Required list of FeatureLayers. A list of input layers that will be used in analysis. Each input layer follows the same formatting as described in the Feature Input topic. This can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A URL to a big data catalog service layer with an optional filter to select specific features
  • A feature collection
variable_calculations Required list of dictionaries. A JSON array containing objects that describe the input layers and the attributes that will be calculated for each layer.
bin_size Required float. The distance for the bins of type binType in the output polygon layer. Enrichment attributes will be calculated at the center of each bin. When generating bins, for Square, the number and units specified determine the height and length of the square. For Hexagon, the number and units specified determine the distance between parallel sides.
bin_unit

Optional string. The distance unit for the bins that will be used to calculate enrichment attributes.

Values: Meters (default), Kilometers, Feet, Miles, NauticalMiles, or Yards

bin_type

Optional string. The type of bin that will be generated. Bin options are the following:

  • Hexagon.
  • Square (default)
output_name Optional string. output name of the layer
gis Optional GIS. The enterprise site that you want to connect to.
Returns:Feature Layer

join_features

summarize_data.join_features(join_layer, join_operation: str = 'JoinOneToOne', join_fields: str = None, summary_fields: str = None, spatial_relationship: str = None, spatial_near_distance: float = None, spatial_near_distance_unit: str = None, temporal_relationship: str = None, temporal_near_distance: int = None, temporal_near_distance_unit: str = None, attribute_relationship: str = None, join_condition: str = None, output_name: str = None, gis=None)

Using either feature layers or tabular data, you can join features and records based on specific relationships between the input layers or tables. Joins will be determined by spatial, temporal, and attribute relationships, and summary statistics can be optionally calculated.

For example

  • Given point locations of crime incidents with a time, join the crime data to itself specifying a spatial relationship of crimes within 1 kilometer of each other and that occurred within 1 hour of each other to determine if there are a sequence of crimes close to each other in space and time.
  • Given a table of ZIP Codes with demographic information and area features representing residential buildings, join the demographic information to the residences so each residence now has the information.

The Join Features task works with two layers. Join Features joins attributes from one feature to another based on spatial, temporal, and attribute relationships or some combination of the three. The tool determines all input features that meet the specified join conditions and joins the second input layer to the first. You can optionally join all features to the matching features or summarize the matching features.

Join Features can be applied to points, lines, areas, and tables. A temporal join requires that your input data is time-enabled, and a spatial join requires that your data has a geometry.

Parameters:

target_layer: Target Features (feature input). Required parameter.

join_layer: Join Features (feature input). Required parameter.

join_operation: Join Operation (str). Required parameter.
Choice list:[‘JoinOneToOne’, ‘JoinOneToMany’]

join_fields: Join Fields (str). Optional parameter.

summary_fields: Summary Statistics (str). Optional parameter.

spatial_relationship: Spatial Relationship (str). Optional parameter.
Choice list:[‘Equals’, ‘Intersects’, ‘Contains’, ‘Within’, ‘Crosses’, ‘Touches’, ‘Overlaps’, ‘Near’]

spatial_near_distance: Near Spatial Distance (float). Optional parameter.

spatial_near_distance_unit: Near Spatial Distance Unit (str). Optional parameter.
Choice list:[‘Feet’, ‘Yards’, ‘Miles’, ‘Meters’, ‘Kilometers’, ‘NauticalMiles’]
temporal_relationship: Temporal Relationship (str). Optional parameter.
Choice list:[‘Equals’, ‘Intersects’, ‘During’, ‘Contains’, ‘Finishes’, ‘FinishedBy’, ‘Meets’, ‘MetBy’, ‘Overlaps’, ‘OverlappedBy’, ‘Starts’, ‘StartedBy’, ‘Near’]

temporal_near_distance: Near Temporal Distance (int). Optional parameter.

temporal_near_distance_unit: Near Temporal Distance Unit (str). Optional parameter.
Choice list:[‘Years’, ‘Months’, ‘Weeks’, ‘Days’, ‘Hours’, ‘Minutes’, ‘Seconds’, ‘Milliseconds’]

attribute_relationship: Attribute Relationships (str). Optional parameter.

join_condition: Join Condition (str). Optional parameter.

output_name: Output Features Name (str). Optional parameter.

gis: Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

Returns:
output - Output Features as Feature Layer Collection Item

reconstruct_tracks

summarize_data.reconstruct_tracks(track_fields: str, method: str = 'Planar', buffer_field: str = None, summary_fields: str = None, time_split: int = None, time_split_unit: str = None, distance_split=None, distance_split_unit=None, output_name: str = None, gis=None)

Using a time-enabled layer of point or polygon features that represent an instant in time, this tool determines which input features belong in a track and will order the inputs sequentially in time. Statistics are optionally calculated for the input features within each track.

For example

  • Given point locations and time of hurricane measurements, calculate the mean wind speed and max wind pressure of the hurricane.

  • Find the highest and lowest monthly revenues for franchise locations using 100 km bins.

    This tool works with a time-enabled layer of either point or polygon features that represent an instant in time. It first determines which features belong to a track using an identifier. Using the time at each location, the tracks are ordered sequentially and transformed into a line or polygon representing the path of movement over time. Optionally, the input may be buffered by a field, which will create a polygon at each location. These buffered points, or if the inputs are polygons, are then joined sequentially to create a track as a polygon where the width is representative of the attribute of interest. Resulting tracks have a start and end time, which represent temporally the first and last feature in a given track. When the tracks are created, statistics about the input features are calculated and assigned to the output track. The most basic statistic is the count of points within the area, but other statistics can be calculated as well.

Features in time-enabled layers can be represented in one of two ways:

Instant-A single moment in time Interval-A start and end time For example, suppose you have GPS measurements of hurricanes every 10 minutes. Each GPS measurement records the hurricane’s name, location, time of recording, and wind speed. With this information, you could create tracks for each hurricane using the name for track identification, and tracks for each hurricane would be generated. Additionally, you could calculate statistics such as the mean, max, and minimum wind speed of each hurricane, as well as the count of measurements within each track.

Using the same example, you could buffer your tracks by the wind speed. This would buffer each measurement by the wind speed field at that location, and join the buffered areas together, creating a polygon representative of the track path, as well as the changes in wind speed as the hurricanes progressed.

Parameters:

input_layer: Input Features (feature input). Required parameter.

track_fields: Track Fields (str). Required parameter.

method: Method (str). Required parameter.
Choice list:[‘Geodesic’, ‘Planar’]

buffer_field: Buffer Distance Field (str). Optional parameter.

summary_fields: Summary Statistics (str/list). Optional parameter.

time_split: Duration Split Threshold (int). Optional parameter.

time_split_unit: Duration Split Threshold Unit (str). Optional parameter.
Choice list:[‘Years’, ‘Months’, ‘Weeks’, ‘Days’, ‘Hours’, ‘Minutes’, ‘Seconds’, ‘Milliseconds’]

distance_split: A distance used to split tracks. Any features in the inputLayer that are in the same track and are greater than this distance apart will be split into a new track. The units of the distance values are supplied by the distance_unit parameter.

distance_split_unit: The distance unit to be used with the distance value specified in distanceSplit.
Values: Meters,Kilometers,Feet,Miles,NauticalMiles, or Yards

output_name: Output Features Name (str). Required parameter.

gis: Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

Returns:
output - Output Features as a Feature Layer Collection Item

summarize_attributes

summarize_data.summarize_attributes(fields: str = None, summary_fields: str = None, output_name: str = None, gis=None)

Using either feature or tabular data, this tool summarizes statistics for specified fields.

For example

  • Given locations of grocery stores with a field COMPANY_NAME, summarize the stores by the company name to determine statistics for each company.
  • Given a table of grocery stores with fields COMPANY_NAME and COUNTY, summarize the stores by the company name and county to determine statistics for each company within each county.

This tool summarizes all the matching values in one or more fields and calculates statistics on them. The most basic statistic is the count of features that have been summarized together, but you can calculate more advanced statistics as well.

For example, suppose you have point features of store locations with a field representing the DISTRICT_MANAGER_NAME and you want to summarize coffee sales by manager. You can specify the field DISTRICT_MANAGER_NAME as the field to dissolve on, and all rows of data representing individual managers will be summarized. This means all store locations that are managed by Manager1 will be summarized into one row with summary statistics calculated. In this instance, statistics like the count of the number of stores and the sum of TOTAL_SALES for all stores that Manager1 manages would be calculated as well as for any other manager listed in the DISTRICT_MANAGER_NAME field.

Parameters:

input_layer: Input Features (feature input). Required parameter.

fields: Summary Fields (str). Required parameter.

summary_fields: Summary Statistics (str/list). Optional parameter.

output_name: Output Features Name (str). Required parameter.

gis: Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

Returns:
output - Output Features as a _FeatureSet

summarize_within

summarize_data.summarize_within(summary_polygons=None, bin_type: str = None, bin_size: float = None, bin_size_unit: str = None, standard_summary_fields: str = None, weighted_summary_fields: str = None, sum_shape: bool = True, shape_units: str = None, group_by_field: str = None, minority_majority: bool = False, percent_shape: bool = False, output_name: str = None, gis=None)

Finds areas (and portions of areas) that overlap between two layers and calculates statistics about the overlap.

For example

  • Given a layer of watershed areas and a layer of land-use areas by land-use type, calculate total acreage of land-use type for each watershed.
  • Given a layer of parcels in a county and a layer of city boundaries, summarize the average value of vacant parcels within each city.

Parameters:

summarized_layer: Layer To Summarize (feature input). Required parameter.

summary_polygons: Summary Polygons Layer (feature input). Optional parameter.

bin_type: Output Bin Type (str). Optional parameter.
Choice list:[‘Square’, ‘Hexagon’]

bin_size: Output Bin Size (float). Optional parameter.

bin_size_unit: Output Bin Size Unit (str). Optional parameter.
Choice list:[‘Feet’, ‘Yards’, ‘Miles’, ‘Meters’, ‘Kilometers’, ‘NauticalMiles’]

standard_summary_fields: Unweighted Summary Statistics (str). Optional parameter.

weighted_summary_fields: Proportional Summary Statistics (str). Optional parameter.

sum_shape: Summarize Shape (bool). Optional parameter.

shape_units: Shape Measure Output Unit (str). Optional parameter.
Choice list:[‘Meters’, ‘Kilometers’, ‘Feet’, ‘Yards’, ‘Miles’, ‘SquareMeters’, ‘SquareKilometers’, ‘Hectares’, ‘SquareFeet’, ‘SquareYards’, ‘SquareMiles’, ‘Acres’]

group_by_field: This is a field of the summarized_layer features that you can use to calculate statistics separately for each unique attribute value. For example, suppose the sumWithinLayer contains city boundaries and the summaryPolygons features are parcels. One of the fields of the parcels is Status which contains two values: VACANT and OCCUPIED. To calculate the total area of vacant and occupied parcels within the boundaries of cities, use Status as the groupByField field. This parameter is available at ArcGIS Enterprise 10.6.1+.

minority_majority: This boolean parameter is applicable only when a group_by_field is specified. If true, the minority (least dominant) or the majority (most dominant) attribute values for each group field are calculated. Two new fields are added to the resultLayer prefixed with Majority_ and Minority_. This parameter is available at ArcGIS Enterprise 10.6.1+. The default is false.

percent_shape: This boolean parameter is applicable only when a group_by_field is specified. If set to true, the percentage of each unique group_by_field value is calculated for each sum within layer polygon. The default is false. This parameter is available at ArcGIS Enterprise 10.6.1+.

output_name: Output Features Name (str). Required parameter.

gis: Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

Returns:
output - Output Features as a layer Item