arcgis.geoanalytics.analyze_patterns module

These tools help you identify, quantify, and visualize spatial patterns in your data.

calculate_density takes known quantities of some phenomenon and spreads these quantities across the map. find_hot_spots identifies statistically significant clustering in the spatial pattern of your data.

calculate_density

arcgis.geoanalytics.analyze_patterns.calculate_density(input_layer, fields: str = None, weight: str = 'Uniform', bin_type: str = 'Square', bin_size: float = None, bin_size_unit: str = None, time_step_interval: int = None, time_step_interval_unit: str = None, time_step_repeat_interval: int = None, time_step_repeat_interval_unit: str = None, time_step_reference: datetime.datetime = None, radius: float = None, radius_unit: str = None, area_units: str = 'SquareKilometers', output_name: str = None, gis=None, context=None, future=False)
_images/calculate_density1.png

The calculate_density tool creates a density map from point features by spreading known quantities of some phenomenon (represented as attributes of the points) across the map. The result is a layer of areas classified from least dense to most dense.

For point input, each point should represent the location of some event or incident, and the result layer represents a count of the incident per unit area. A higher density value in a new location means that there are more points near that location. In many cases, the result layer can be interpreted as a risk surface for future events. For example, if the input points represent locations of lightning strikes, the result layer can be interpreted as a risk surface for future lightning strikes.

Other use cases of this tool include the following:

  • Creating crime density maps to help police departments properly allocate resources to high crime areas.

  • Calculating densities of hospitals within a county. The result layer will show areas with high and low accessibility to ospitals, and this information can be used to decide where new hospitals should be built.

  • Identifying areas that are at high risk of forest fires based on historical locations of forest fires.

  • Locating communities that are far from major highways in order to plan where new roads should be constructed.

Argument

Description

input_layer

Required point feature layer. The point layer on which the density will be calculated.

Analysis using Square or Hexagon bins requires a projected coordinate system. When aggregating layers into bins, the input layer or processing extent (processSR) must have a projected coordinate system. At 10.5.1, 10.6, and 10.6.1, if a projected coordinate system is not specified when running analysis, the World Cylindrical Equal Area (WKID 54034) projection will be used. At 10.7 or later, if a projected coordinate system is not specified when running analysis, a projection will be picked based on the extent of the data. See Feature Input.

fields

Optional string. Provides one or more field specifying the number of incidents at each location. You can calculate the density on multiple fields, and the count of points will always have the density calculated.

weight

Required string. The type of weighting applied to the density calculation. There are two options:

  • Uniform - Calculates a magnitude-per-area. This is the default.

  • Kernel - Applies a kernel function to fit a smooth tapered surface to each point.

The default value is “Uniform”.

bin_type

Required string. The type of bin used to calculate density.

Choice list: [‘Hexagon’, ‘Square’].

bin_size

Required float. The distance for the bins that the input_layer will be analyzed using. When generating bins, for Square, the number and units specified determine the height and length of the square. For Hexagon, the number and units specified determine the distance between parallel sides.

bin_size_unit

Required string. The distance unit for the bins for which the density will be calculated. The linear unit to be used with the value specified in bin_size.

The default is ‘Meters’.

time_step_interval

Optional integer. A numeric value that specifies duration of the time step interval. This option is only available if the input points are time-enabled and represent an instant in time.

The default value is ‘None’.

time_step_interval_unit

Optional string. A string that specifies units of the time step interval. This option is only available if the input points are time-enabled and represent an instant in time.

Choice list: [‘Milliseconds’, ‘Seconds’, ‘Minutes’, ‘Hours’, ‘Days’, ‘Weeks’, ‘Months’, ‘Years’]

The default value is ‘None’.

time_step_repeat_interval

Optional integer. A numeric value that specifies how often the time step repeat occurs. This option is only available if the input points are time-enabled and of time type instant.

time_step_repeat_interval_unit

Optional string. A string that specifies the temporal unit of the step repeat. This option is only available if the input points are time-enabled and of time type instant.

Choice list:[‘Years’, ‘Months’, ‘Weeks’, ‘Days’, ‘Hours’, ‘Minutes’, ‘Seconds’, ‘Milliseconds’]

The default value is ‘None’.

time_step_reference

Optional datetime. A date that specifies the reference time to align the time slices to, represented in milliseconds from epoch. If time_step_reference is set to ‘None’, time stepping will align to January 1st, 1970 (datetime(1970, 1, 1)). This option is only available if the input points are time-enabled and of time type instant.

The default value is ‘None’.

radius

Required integer. The size of the neighborhood within which to calculate the density. The radius size must be larger than the bin_size.

radius_unit

Required string. The distance unit for the radius defining the neighborhood for which the density will be calculated. The linear unit to be used with the value specified in bin_size.

Choice list:[‘Feet’, ‘Yards’, ‘Miles’, ‘Meters’, ‘Kilometers’, ‘NauticalMiles’]

The default value is ‘Meters’.

area_units

Optional string. The desired output units of the density values. If density values are very small, you can increase the size of the area units (for example, square meters to square kilometers) to return larger values. This value only scales the result. Possible area units are:

Choice list: [‘SquareMeters’, ‘SquareKilometers’, ‘Hectares’, ‘SquareFeet’, ‘SquareYards’, ‘SquareMiles’, ‘Acres’].

The default value is “SquareKilometers”.

output_name

Optional string. The method will create a feature service of the results. You define the name of the service.

gis

Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

context

Optional dict. The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  1. Extent (extent) - A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.

  2. Processing spatial reference (processSR) - The features will be projected into this coordinate system for analysis.

  3. Output spatial reference (outSR) - The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.

  4. Data store (dataStore) - Results will be saved to the specified data store. The default is the spatiotemporal big data store.

future

Optional boolean. If True, a GPJob is returned instead of results. The GPJob can be queried on the status of the execution.

Returns

result_layer : Output Features as feature layer item.

# Usage Example: Aggregate the number of Hurricanes within 1 meter to calculate density of Hurricane damage.

cal_den_result = calculate_density(input_layer=hurricane_lyr,
                                   fields='Damage',
                                   weight='Uniform',
                                   bin_type='Square',
                                   bin_size=1,
                                   bin_size_unit="Meters",
                                   radius=2,
                                   radius_unit="Yards")

create_space_time_cube

arcgis.geoanalytics.analyze_patterns.create_space_time_cube(point_layer: arcgis.features.feature.FeatureSet, bin_size: float, bin_size_unit: str, time_step_interval: int, time_step_interval_unit: str, time_step_alignment: str = None, time_step_reference: datetime.datetime = None, summary_fields: str = None, output_name: str = None, context: str = None, gis=None, future: bool = False) → arcgis.geoprocessing._types.DataFile
_images/create_space_time_cube.png

create_space_time_cube works with a layer of point features that are time enabled. It aggregates the data into a three-dimensional cube of space-time bins. When determining the point in a space-time bin relationship, statistics about all points in the space-time bins are calculated and assigned to the bins. The most basic statistic is the number of points within the bins, but you can calculate other statistics as well. For example, suppose you have point features of crimes in a city, and you want to summarize the number of crimes in both space and time. You can calculate the space-time cube for the dataset, and use the cube to further analyze trends such as emerging hot and cold spots.

Argument

Description

point_layer

Required point feature layer. The point features that will be aggregated into the bins specified in geographical size by the bin_size and bin_size_unit parameters and temporal size by the time_step_interval and time_step_interval_unit parameters. See Feature Input. Analysis using bins requires a projected coordinate system. When aggregating layers into bins, the input layer or processing extent (processSR) must have a projected coordinate system. At 10.5.1, 10.6, and 10.6.1, if a projected coordinate system is not specified when running analysis, the World Cylindrical Equal Area (WKID 54034) projection will be used. At 10.7 or later, if a projected coordinate system is not specified when running analysis, a projection will be picked based on the extent of the data. .. Note: The input_layer must have a minimum of 60 features.

bin_size

Required float. The distance for the bins into which point_layer will be aggregated. .. Note: A create_space_time_cube must have at least 10 time slices.

bin_size_unit

Required string. The distance unit for the bins into which point_layer will be aggregated. Choice list: [‘Feet’, ‘Yards’, ‘Miles’, ‘Meters’, ‘Kilometers’, ‘NauticalMiles’]

time_step_interval

Required integer. A numeric value that specifies the duration of the time bin. .. Note:

time_step_interval_unit

Required string. A numeric value that specifies the duration unit of the time bin. Choice list:[‘Years’, ‘Months’, ‘Weeks’, ‘Days’, ‘Hours’, ‘Minutes’, ‘Seconds’, ‘Milliseconds’]

time_step_alignment

Optional string. Defines how aggregation will occur based on a given timeInterval. Options are as follows: Choice list: [‘EndTime’, ‘StartTime’, ‘ReferenceTime’]

  • StartTime - Time is aligned to the first feature in time

  • EndTime - Time is aligned to the last feature in time

  • ReferenceTime - Time is aligned a specified time

time_step_reference (Required if time_step_alignment is ReferenceTime)

Optional datetime. A date that specifies the reference time to align the time bins to if ReferenceTime is specified in time_step_alignment.

summary_fields

Optional string. A list of field names, statistical summary types, and the fill option for empty values that you want to calculate for all points within each space-time bin. Note that the count of points within each bin is always returned. By default, all statistics are returned. Example: [{“statisticType”: “statistic type”, “onStatisticField”: “field name”, “fillType”: “fill type”, “onStatisticField”: “fieldName2”}]

fieldName is the name of the fields in the input point layer. statisticType is one of the following for numeric fields: * Sum - Adds the total value of all the points in each polygon. * Mean - Calculates the average of all the points in each polygon. * Min - Finds the smallest value of all the points in each polygon. * Max - Finds the largest value of all the points in each polygon. * Stddev - Finds the standard deviation of all the points in each polygon. statisticType is the following for string fields: * Count - Totals the number of strings for all the points in each polygon.

fillType is one of the following: * zeros - Fills missing values with zeros. This is most appropriate for fields representing counts. * spatialNeighbors - Fills missing values by averaging the spatial neighbors. Neighbors are determined by a second degree queens contiguity. * spaceTimeNeighbors - Fills missing values by averaging the space-time neighbors. Neighbors are determined by a second degree queens contiguity in both space and time. * temporalTrend - Interpolates values using a univariate spline.

output_name

Required string. The task will create a space time cube (netCDF) of the results. You define the name of the space time cube.

context

Optional string. Context contains additional settings that affect task execution. For this task, there are three settings:

  1. Extent (extent) - A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.

  2. Processing spatial reference (processSR) - The features will be projected into this coordinate system for analysis.

gis

Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

future

Optional boolean. If True, a GPJob is returned instead of results. The GPJob can be queried on the status of the execution.

Returns

dict with url containing the path to Output Space Time Cube (netCDF) dataFile. When you browse to the output url, your netCDF will automatically download to your local machine.

# Usage Example: To aggregate Chicago homicides date layer into 3-dimensional cubes of 5 miles bin.
create_space_time_cube(point_layer=lyr,
                       bin_size=5,
                       bin_size_unit="Miles",
                       time_step_interval=1,
                       time_step_interval_unit="Days",
                       time_step_alignment='StartTime',
                       time_step_reference=datetime(1995, 10, 4),
                       summary_fields=[{"statisticType": "Mean", "onStatisticField" : "Beat", "fillType" : "temporalTrend" }],
                       output_name="spacecube")

find_hot_spots

arcgis.geoanalytics.analyze_patterns.find_hot_spots(point_layer, bin_size: float = 5, bin_size_unit: str = 'Miles', neighborhood_distance: float = 5, neighborhood_distance_unit: str = 'Miles', time_step_interval: int = None, time_step_interval_unit: str = None, time_step_alignment: str = None, time_step_reference: datetime.datetime = None, output_name: str = None, gis=None, context=None, future=False)
_images/geo_find_hot_spots.png

The find_hot_spots tool analyzes point data (such as crime incidents, traffic accidents, trees, and so on) or field values associated with points. It finds statistically significant spatial clusters of high incidents (hot spots) and low incidents (cold spots). Hot spots are locations with lots of points and cold spots are locations with very few points.

The result map layer shows hot spots in red and cold spots in blue. The darkest red features indicate the strongest clustering of point densities; you can be 99 percent confident that the clustering associated with these features could not be the result of random chance. Similarly, the darkest blue features are associated with the strongest spatial clustering of the lowest point densities. Features that are beige are not part of a statistically significant cluster; the spatial pattern associated with these features could very likely be the result of random processes and random chance.

Argument

Description

point_layer

Required feature layer. The point feature layer for which hot spots will be calculated. See Feature Input.

Note

Analysis using bins requires a projected coordinate system. When aggregating layers into bins, the input layer or processing extent (processSR) must have a projected coordinate system. At 10.5.1, 10.6, and 10.6.1, if a projected coordinate system is not specified when running analysis, the World Cylindrical Equal Area (WKID 54034) projection will be used. At 10.7 or later, if a projected coordinate system is not specified when running analysis, a projection will be picked based on the extent of the data.

bin_size

Optional float. The distance for the square bins the point_layer will be aggregated into.

bin_size_unit

Optional string. The distance unit for the bins with which hot spots will be calculated. The linear unit to be used with the value specified in bin_size. When generating bins the number and units specified determine the height and length of the square.

Choice list:[‘Feet’, ‘Yards’, ‘Miles’, ‘Meters’, ‘Kilometers’, ‘NauticalMiles’]

The default value is Miles.

neighborhood_distance

Optional float. The size of the neighborhood within which to calculate the hot spots. The radius size must be larger than bin_size.

neighborhood_distance_unit

Optional string. The distance unit for the radius defining the neighborhood where the hot spots will be calculated. The linear unit to be used with the value specified in bin_size.

Choice list:[‘Feet’, ‘Yards’, ‘Miles’, ‘Meters’, ‘Kilometers’, ‘NauticalMiles’]

The default value is ‘Miles’.

time_step_interval

Optional integer. A numeric value that specifies duration of the time step interval. This option is only available if the input points are time-enabled and represent an instant in time.

time_step_interval_unit

Optional string. A string that specifies units of the time step interval. This option is only available if the input points are time-enabled and represent an instant in time.

Choice list:[‘Years’, ‘Months’, ‘Weeks’, ‘Days’, ‘Hours’, ‘Minutes’, ‘Seconds’, ‘Milliseconds’]

time_step_alignment

Optional string. Defines how aggregation will occur based on a given time_step_interval. Options are as follows:

Choice list:[‘EndTime’, ‘StartTime’, ‘ReferenceTime’]

  • StartTime - Time is aligned to the first feature in time.

  • EndTime - Time is aligned to the last feature in time.

  • ReferenceTime - Time is aligned a specified time in time_step_reference.

time_step_reference (Required if time_step_alignment is ReferenceTime)

Optional datetime. A date that specifies the reference time to align the time slices to. This option is only available if the input points are time-enabled and of time type instant.

output_name

Optional string. The task will create a feature service of the results. You define the name of the service.

context

Optional string. Context contains additional settings that affect task execution. For this task, there are three settings:

  1. Extent (extent) - a bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.

  2. Processing spatial reference (processSR) The features will be projected into this coordinate system for analysis.

  3. Output Spatial Reference (outSR) - the features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.

  4. Data store (dataStore) Results will be saved to the specified data store. The default is the spatiotemporal big data store.

gis

Optional, the GIS on which this tool runs. If not specified, the active GIS is used.

future

Optional boolean. If True, a GPJob is returned instead of results. The GPJob can be queried on the status of the execution.

Returns

Output Features as a feature layer collection item

# Usage Example: To find significantly hot or cold spots of fire incidents.
find_hot_spots(point_layer=fire,
               bin_size=5,
               bin_size_unit='Miles',
               neighborhood_distance=5,
               neighborhood_distance_unit='Miles',
               time_step_interval=1,
               time_step_interval_unit="Years",
               time_step_alignment="StartTime",
               time_step_reference=None,
               output_name='find hot spots')

find_point_clusters

arcgis.geoanalytics.analyze_patterns.find_point_clusters(input_layer, method, min_feature_clusters, search_distance=None, distance_unit=None, output_name=None, gis=None, context=None, future=False, time_method=None, search_duration=None, duration_unit=None)

This tool extracts clusters from your input point features and identifies any surrounding noise.

For example, a nongovernmental organization is studying a particular pest-borne disease. It has a point dataset representing households in a study area, some of which are infested, and some of which are not. By using the Find Point Clusters tool, an analyst can determine clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.

Argument

Description

input_layer

required FeatureSet, The table, point, line or polygon features containing potential incidents.

method

required String. The algorithm used for cluster analysis. This parameter must be specified as DBSCAN or HDBSCAN.

min_feature_clusters

optional Integer. Minimum number of clusters to find in a dataset.

search_distance

optional Float. The distance to search between points to form a cluster. This is required for DBSCAN.

distance_unit

optional String. The search_distance units.

output_name

optional string, The task will create a feature service of the results. You define the name of the service.

gis

optional GIS, the GIS on which this tool runs. If not specified, the active GIS is used.

context

Optional dict. The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  1. Extent (extent) - A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.

  2. Processing spatial reference (processSR) - The features will be projected into this coordinate system for analysis.

  3. Output spatial reference (outSR) - The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.

  4. Data store (dataStore) - Results will be saved to the specified data store. The default is the spatiotemporal big data store.

future

Optional boolean. If True, a GPJob is returned instead of results. The GPJob can be queried on the status of the execution.

time_method

Optional String. When this parameter is set to Linear and method is DBSCAN, both space and time will be used to find point clusters. If method is HDBSCAN, this parameter will be ignored and clusters will be found in space only. This parameter can only be used if input_layer has time enabled and is of type instant. Temporal clustering is available at ArcGIS Enterprise 10.8.

search_duration

Optional String. When using DBSCAN with timeMethod set as Linear, this parameter is the time duration within which min_feature_clusters must be found. This parameter is not used when HDBSCAN is chosen as the clustering method or when time_method is not used.

duration_unit

Optional String. The units used for the search_duration parameter. This parameter is required when using DBSCAN but will not be used with HDBSCAN or space-only DBSCAN.

Returns

Output feature layer item

forest

arcgis.geoanalytics.analyze_patterns.forest(input_layer, var_prediction, var_explanatory, trees, max_tree_depth=None, random_vars=None, sample_size=100, min_leaf_size=None, prediction_type='train', features_to_predict=None, validation=10, importance_tbl=False, exp_var_matching=None, output_name=None, gis=None, context=None, future=False, return_tuple=False)
_images/forest.png

The ‘forest’ method is a forest-based classification and regression task that creates models and generates predictions using an adaptation of Leo Breiman’s random forest algorithm, which is a supervised machine learning method. Predictions can be performed for both categorical variables (classification) and continuous variables (regression). Explanatory variables can take the form of fields in the attribute table of the training features. In addition to validation of model performance based on the training data, predictions can be made to another feature dataset.

The following are examples:

  • Given data on occurrence of seagrass, as well as a number of environmental explanatory variables represented as both attributes which has been enriched using a multi-variable grid to calculate distances to factories upstream and major ports, future seagrass occurrence can be predicted based on future projections for those same environmental explanatory variables.

  • Suppose you have crop yield data at hundreds of farms across the country along with other attributes at each of those farms (number of employees, acreage, and so on). Using these pieces of data, you can provide a set of features representing farms where you don’t have crop yield (but you do have all of the other variables), and make a prediction about crop yield.

  • Housing values can be predicted based on the prices of houses that have been sold in the current year. The sale price of homes sold along with information about the number of bedrooms, distance to schools, proximity to major highways, average income, and crime counts can be used to predict sale prices of similar homes.

Note

Forest Based Classification and Regression is available at ArcGIS Enterprise 10.7.

Argument

Description

input_layer

Required layer. The features that will be used to train the dataset. This layer must include fields representing the variable to predict and the explanatory variables. See Feature Input.

var_prediction

Required dict. The variable from the input_layer parameter containing the values to be used to train the model, and a boolean denoting if it’s categorical. This field contains known (training) values of the variable that will be used to predict at unknown locations.

Syntax: {“fieldName”:”<field name>”, “categorical”:bool}

var_explanatory

Required list. A list of fields representing the explanatory variables and a boolean value denoting whether the fields are categorical. The explanatory variables help predict the value or category of the var_prediction parameter. Use the categorical parameter for any variables that represent classes or categories (such as land cover or presence or absence). Specify the variable as ‘True’ for any that represent classes or categories such as land cover or presence or absence and ‘False’ if the variable is continuous.

Syntax: [{“fieldName”:”<field name>”, “categorical”:bool},…]

trees

Required integer. The number of trees to create in the forest model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate.

max_tree_depth

Optional integer. The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. The default is data driven and depends on the number of trees created and the number of variables included. The max_tree_depth must be positive and less than or equal to 30.

random_vars

Optional integer. Specifies the number of explanatory variables used to create each decision tree.Each of the decision trees in the forest is created using a random subset of the explanatory variables specified. Increasing the number of variables used in each decision tree will increase the chances of overfitting your model particularly if there is one or a couple dominant variables. A common practice is to use the square root of the total number of explanatory variables (fields, distances, and rasters combined) if your variablePredict is numeric or divide the total number of explanatory variables (fields, distances, and rasters combined) by 3 if var_prediction is categorical.

sample_size

Optional integer. Specifies the percentage of the input_layer used for each decision tree. Samples for each tree are taken randomly from two-thirds of the data specified.

The default is 100 percent of the data.

min_leaf_size

Optional integer. The minimum number of observations required to keep a leaf (that is the terminal node on a tree without further splits). For very large data, increasing these numbers will decrease the run time of the tool.

The default minimum for regression is 5 and the default for classification is 1.

prediction_type

Optional string. Specifies the operation mode of the tool. The tool can be run to train a model to only assess performance, or train a model and predict features. Prediction types are as follows:

  • Train - A model will be trained, but no predictions will be generated. Use this option to assess the accuracy of your model before generating predictions. This option will output model diagnostics in the messages window and a chart of variable importance.

  • TrainAndPredict - Predictions or classifications will be generated for features. Explanatory variables must be provided for both the training features and the features to be predicted. The output of this option will be a feature service, model diagnostics, and an optional table of variable importance.

The default value is ‘Train’.

features_to_predict (Required if using TrainAndPredict)

Optional layer. A feature layer representing locations where predictions will be made. This layer must include explanatory variable fields that correspond to fields used in input_layer. This parameter is only used when the prediction_type is TrainAndPredict and is required in that case. See Feature Input.

validation

Optional integer. Specifies the percentage (between 10 percent and 50 percent) of inFeatures to reserve as the test dataset for validation. The model will be trained without this random subset of data, and the observed values for those features will be compared to the predicted value.

The default value is 10 percent.

importance_tbl

Optional boolean. Specifies whether an output table will be generated that contains information describing the importance of each explanatory variable used in the model created.

exp_var_matching

Optional list of dicts. A list of fields representing the explanatory variables and a boolean values denoting if the fields are categorical. The explanatory variables help predict the value or category of the variable_predict. Use the categorical parameter for any variables that represent classes or categories (such as landcover or presence or absence). Specify the variable as ‘True’ for any that represent classes or categories such as landcover or presence or absence and ‘False’ if the variable is continuous.

Syntax: [{“fieldName”:”<explanatory field name>”, “categorical”:bool},

  • fieldname is the name of the field in the input_layer used to predict the var_prediction.

  • categorical is one of: ‘True’ or ‘False’. A string field should always be ‘True’, and a continue value should always be set as ‘False’.

output_name

Optional string. The task will create a feature service of the results. You define the name of the service.

gis

Optional GIS. The GIS on which this tool runs. If not specified, the active GIS is used.

context

Optional dict. The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  1. Extent (extent) - A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.

  2. Processing spatial reference (processSR) - The features will be projected into this coordinate system for analysis.

  3. Output spatial reference (outSR) - The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.

  4. Data store (dataStore) - Results will be saved to the specified data store. The default is the spatiotemporal big data store.

future

Optional boolean. If ‘True’, a GPJob is returned instead of results. The GPJob can be queried on the status of the execution.

The default value is ‘False’.

return_tuple

Optional boolean. If ‘True’, a named tuple with multiple output keys is returned.

The default value is ‘False’.

Returns

a named tuple with the following keys if return_tuple is set to ‘True’:

”output_trained” : featureLayer

”output_predicted” : featureLayer

”variable_of_importance” : Table

”process_info” : list

else returns a feature layer of the results.

# Usage Example: To predict the number of 911 calls in each block group.
predicted_result = forest(input_layer=call_lyr,
                          var_prediction={"fieldName":"Calls", "categorical":False},
                          var_explanatory=[{"fieldName":"Pop", "categorical":False},
                                            {"fieldName":"Unemployed", "categorical":False},
                                            {"fieldName":"AlcoholX", "categorical":False},
                                            {"fieldName":"UnEmpRate", "categorical":False},
                                            {"fieldName":"MedAge00", "categorical":False}],
                          trees=50,
                          max_tree_depth=10,
                          random_vars=3,
                          sample_size=100,
                          min_leaf_size=5,
                          prediction_type='TrainAndPredict',
                          validation=10,
                          importance_tbl=True,
                          output_name='train and predict number of 911 calls')

glr

arcgis.geoanalytics.analyze_patterns.glr(input_layer, var_dependent, var_explanatory, regression_family='Continuous', features_to_predict=None, gen_coeff_table=False, exp_var_matching=None, dep_mapping=None, output_name=None, gis=None, context=None, future=False, return_tuple=False)
_images/glr.png

This tool performs Generalized Linear Regression (glr) to generate predictions or to model a dependent variable’s relationship to a set of explanatory variables. This tool can be used to fit continuous (Gaussian/OLS), binary (logistic), and count (Poisson) models.

The following are examples of the tool’s utility:

  • What demographic characteristics contribute to high rates of public transportation usage?

  • Is there a positive relationship between vandalism and burglary?

  • Which variables effectively predict 911 call volume? Given future projections, what is the expected demand for emergency response resources?

  • What variables affect low birth rates?

Argument

Description

input_layer

Required layer. The layer containing the dependent and independent variables. See Feature Input.

var_dependent

Required string. The numeric field containing the observed values you want to model.

var_explanatory

Required list of strings. One or more fields representing independent explanatory variables in your regression model.

regression_family

Optional string. This field specifies the type of data you are modeling.

regression_family is one of the following:

  • Continuous - The dependent_variable is continuous. The model used is Gaussian, and the tool performs ordinary least squares regression.

  • Binary - The dependent_variable represents presence or absence. Values must be 0 (absence) or 1 (presence) values, or mapped to 0 and 1 values using the parameter.

  • Count - The dependent_variable is discrete and represents events, such as crime counts, disease incidents, or traffic accidents. The model used is Poisson regression.

The default value is ‘Continuous’.

features_to_predict

Optional layer. A layer containing features representing locations where estimates should be computed. Each feature in this dataset should contain values for all the explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input layer data. See Feature Input.

gen_coeff_table

Optional boolean. Determines if a table with coefficient values will be returned. By default, the coefficient table is not returned.

exp_var_matching

Optional list of dicts. A list of the var_explanatory specified from the input_layer and their corresponding fields from the features_to_predict. By default, if an var_explanatory variiables is not mapped, it will match to a field with the same name in the features_to_predict. This parameter is only used if there is a features_to_predict input. You do not need to use it if the names and types of the fields match between your two input datasets.

Syntax: [{“predictionLayerField”:”<field name>”,”trainingLayerField”: “<field name>”},…]

  • predictionLayerField is the name of a field specified in the var_explanatoryiables parameter.

  • trainingLayerField is the field that will match to the field in the var_explanatoryiables parameter.

dep_mapping

Optional list of dicts. A list representing the values used to map to 0 (absence) and 1 (presence) for binary regression.

Syntax: [{“value0”:”<false value>”},{“value1”:”<true value>”}]

  • value0 is the string that will be used to represent 0 (absence values).

  • value1 is the string that will be used to represent 1 (presence values).

output_name

Optional string. The task will create a feature service of the results. You define the name of the service.

gis

Optional GIS. The GIS on which this tool runs. If not specified, the active GIS is used.

context

Optional dict. The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  1. Extent (extent) - A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.

  2. Processing spatial reference (processSR) - The features will be projected into this coordinate system for analysis.

  3. Output spatial reference (outSR) - The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.

  4. Data store (dataStore) - Results will be saved to the specified data store. The default is the spatiotemporal big data store.

future

Optional boolean. If ‘True’, a GPJob is returned instead of results. The GPJob can be queried on the status of the execution.

The default value is ‘False’.

return_tuple

Optional boolean. If ‘True’, a named tuple with multiple output keys is returned.

The default value is ‘False’.

Returns

a named tuple with the following keys if return_tuple is set to ‘True’:

”output” : featureLayer

”output_predicted” : featureLayer

”coefficient_table” : Table

”process_info” : list

else returns a feature layer of the results.

# Usage Example: To train a model for predicting 911 calls.
result_predicted = glr(input_layer=911_calls_lyr,
                       var_dependent='Calls',
                       var_explanatory='Unemployed, AlcoholX, UnEmpRate, MedAge00',
                       regression_family='Count',
                       gen_coeff_table=True,
                       output_name="predicted calls")

gwr

arcgis.geoanalytics.analyze_patterns.gwr(input_layer, explanatory_variables, dependent_variable, model_type='Continuous', neighborhood_selection_method='UserDefined', neighborhood_type='NumberOfNeighbors', distance_band=None, distance_band_unit=None, number_of_neighbors=None, local_weighting_scheme='BiSquare', output_name=None, context=None, gis=None, future=False)

This tool performs GeographicallyWeightedRegression (GWR), which is a local form of linear regression used to model spatially varying relationships.

The following are examples of the types of questions you can answer using this tool:

  1. Is the relationship between educational attainment and income consistent across the study area?

  2. What are the key variables that explain high forest fire frequency?

  3. Where are the districts in which children are achieving high test scores? What characteristics seem to be associated? Where is each characteristic most important?

Argument

Description

input_layer

Required layer. The features that will be used to train the dataset. This layer must include fields representing the variable to predict and the explanatory variables. See Feature Input.

dependent_variable

Required list. The numeric field containing the observed values you want to model.

Syntax: [‘arrests’]

explanatory_variables

Required list. One or more fields representing independent explanatory variables in your regression model.

Syntax: [‘population’, ‘avg_income’, ‘avg_ed_lvl’]

model_type

Optional String. The default is ‘Continuous’. Specifies the type of data that will be modeled.

neighborhood_selection_method

Optional String. The default value is ‘NumberOfNeighbors’. Specifies how the neighborhood size will be determined.

  • UserDefined - The neighborhood size will be specified by either the

    number_of_neighbors or distance_band parameter.

neighborhood_type

Specifies whether the neighborhood used is constructed as a fixed distance or allowed to vary in spatial extent depending on the density of the features.

  • DistanceBand - The neighborhood size is a constant or fixed distance for each feature.

  • NumberOfNeighbors - The neighborhood size is a function of a specified number of neighbors included in calculations for each feature. Where features are dense, the spatial extent of the neighborhood is smaller; where features are sparse, the spatial extent of the neighborhood is larger.

distance_band

Optional Float. The distance for the spatial extent of the neighborhood.

distance_band_unit

Optional String. The unit of the distance for the spatial extent of the neighborhood.

Values: Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards

number_of_neighbors

Optional Integer. The closest number of neighbors to consider for each feature. The number should be an integer greater than or equal to 2.

local_weighting_scheme

Optional String. Specifies the kernel type that will be used to provide the spatial weighting in the model. The kernel defines how each feature is related to other features within its neighborhood.

  • BiSquare - A weight of 0 will be assigned to any feature outside the neighborhood specified. This is the default.

  • Gaussian - All features will receive weights, but weights become exponentially smaller the farther away from the target feature.

output_name

Optional string. The task will create a feature service of the results. You define the name of the service.

gis

Optional GIS. The GIS on which this tool runs. If not specified, the active GIS is used.

context

Optional dict. The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  1. Extent (extent) - A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.

  2. Processing spatial reference (processSR) - The features will be projected into this coordinate system for analysis.

  3. Output spatial reference (outSR) - The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.

  4. Data store (dataStore) - Results will be saved to the specified data store. The default is the spatiotemporal big data store.

future

Optional boolean. If ‘True’, a GPJob is returned instead of results. The GPJob can be queried on the status of the execution.

The default value is ‘False’.