arcgis.features module

The arcgis.features module contains types and functions for working with features and feature layers in the GIS.

Entities located in space with a geometrical representation (such as points, lines or polygons) and a set of properties can be represented as features. The arcgis.features module is used for working with feature data, feature layers and collections of feature layers in the GIS. It also contains the spatial analysis functions which operate against feature data.

In the GIS, entities located in space with a set of properties can be represented as features. Features are stored as feature classes, which represent a set of features located using a single spatial type (point, line, polygon) and a common set of properties. This is the geographic extension of the classic tabular or relational representation for entities - a set of entities is modelled as rows in a table. Tables represent entity classes with uniform properties. In addition to working with entities with location as features, the system can also work with non-spatial entities as rows in tables. The system can also model relationships between entities using properties which act as primary and foreign keys. A collection of feature classes and tables, with the associated relationships among the entities, is a feature layer collection. FeatureLayerCollections are one of the dataset types contained in a Datastore. Finally, features are not simply entities in a dataset. Features have a visual representation and user experience - on a map, in a 3D scene, as entities with a property sheet or popups.

arcgis.features.Feature

class arcgis.features.Feature(geometry=None, attributes=None)

Entities located in space with a set of properties can be represented as features.

# Obtain a feature from a feature layer:

feat_set = feature_layer.query(where="OBJECTID=1")
feat = feat_set[0]
as_dict
Returns:the feature as a dictionary
as_row
Returns:the feature as a tuple containing two lists:
List of: Description
row values the specific attribute values and geometry for this feature
field names the name for each attribute field
attributes
Returns:a dictionary of feature attribute values with field names as the key
fields
Returns:attribute field names for the feature as a list of strings
classmethod from_dict(feature)
Returns:a feature from a dict
classmethod from_json(json_str)
Returns:a feature from a JSON string
geometry
Returns:the feature geometry
geometry_type
Returns:the geometry type of the feature as a string
get_value(field_name)

Retrieves the value for a specified field name

Argument   Description
field name  
Required String. The name of the field to get the value for.

feature.fields will return a list of all field names.
Returns:value for the specified attribute field of the feature.
set_value(field_name, value)

Sets an attribute value for a given field name

Argument Description
field_name Required String. The name of the field to update.
value Required. Value to update the field with.
Returns:boolean indicating whether field_name value was updated.

arcgis.features.FeatureLayer

class arcgis.features.FeatureLayer(url, gis=None, container=None, dynamic_layer=None)

The feature layer is the primary concept for working with features in a GIS.

Users create, import, export, analyze, edit, and visualize features, i.e. entities in space as feature layers.

Feature layers can be added to and visualized using maps. They act as inputs to and outputs from feature analysis tools.

Feature layers are created by publishing feature data to a GIS, and are exposed as a broader resource (Item) in the GIS. Feature layer objects can be obtained through the layers attribute on feature layer Items in the GIS.

append(item_id=None, upload_format='featureCollection', source_table_name=None, field_mappings=None, edits=None, source_info=None, upsert=True, skip_updates=False, use_globalids=False, update_geometry=True, append_fields=None, rollback=False, skip_inserts=None, upsert_matching_field=None)

Only available in ArcGIS Online

Update an existing hosted feature layer using append.

Argument Description
source_table_name optional string. Required only when the source data contains more than one tables, e.g., for file geodatabase. Example: source_table_name= “Building”
item_id optional string. The ID for the Portal item that contains the source file. Used in conjunction with editsUploadFormat.
field_mappings

optional list. Used to map source data to a destination layer. Syntax: fieldMappings=[{“name” : <”targerName”>,

“sourceName” : < “sourceName”>}, …]
Examples: fieldMappings=[{“name” : “CountyID”,
“sourceName” : “GEOID10”}]
edits optional string. Only feature collection json is supported. Append supports all format through the upload_id or item_id.
source_info optional dictionary. This is only needed when appending data from excel or csv. The appendSourceInfo can be the publishing parameter returned from analyze the csv or excel file.
upsert optional boolean. Optional parameter specifying whether the edits needs to be applied as updates if the feature already exists. Default is false.
skip_updates Optional boolean. Parameter is used only when upsert is true.
use_globalids Optional boolean. Specifying whether upsert needs to use GlobalId when matching features.
update_geometry Optional boolean. The parameter is used only when upsert is true. Skip updating the geometry and update only the attributes for existing features if they match source features by objectId or globalId.(as specified by useGlobalIds parameter).
append_fields Optional list. The list of destination fields to append to. This is supported when upsert=true or false. Values: [“fieldName1”, “fieldName2”,….]
upload_format required string. The source append data format. The default is featureCollection format. Values: sqlite | shapefile | filegdb | featureCollection | geojson | csv | excel
rollback Optional boolean. Optional parameter specifying whether the upsert edits needs to be rolled back in case of failure. Default is false.
skip_inserts Used only when upsert is true. Used to skip inserts if the value is true. The default value is false.
upsert_matching_field Optional string. The layer field to be used when matching features with upsert. ObjectId, GlobalId, and any other field that has a unique index can be used with upsert. This parameter overrides use_globalids; e.g., specifying upsert_matching_field will be used even if you specify use_globalids = True. Example: upsert_matching_field=”MyfieldWithUniqueIndex”
Returns:boolean
calculate(where, calc_expression, sql_format='standard', version=None, sessionid=None, return_edit_moment=None)

The calculate operation is performed on a feature layer resource. It updates the values of one or more fields in an existing feature service layer based on SQL expressions or scalar values. The calculate operation can only be used if the supportsCalculate property of the layer is true. Neither the Shape field nor system fields can be updated using calculate. System fields include ObjectId and GlobalId. See Calculate a field for more information on supported expressions

Inputs Description
where Required String. A where clause can be used to limit the updated records. Any legal SQL where clause operating on the fields in the layer is allowed.
calc_expression

Required List. The array of field/value info objects that contain the field or fields to update and their scalar values or SQL expression. Allowed types are dictionary and list. List must be a list of dictionary objects.

Calculation Format is as follows:

{“field” : “<field name>”, “value” : “<value>”}
sql_format

Optional String. The SQL format for the calc_expression. It can be either standard SQL92 (standard) or native SQL (native). The default is standard.

Values: standard, native

version Optional String. The geodatabase version to apply the edits.
sessionid Optional String. A parameter which is set by a client during long transaction editing on a branch version. The sessionid is a GUID value that clients establish at the beginning and use throughout the edit session. The sessonid ensures isolation during the edit session. This parameter applies only if the isDataBranchVersioned property of the layer is true.
return_edit_moment Optional Boolean. This parameter specifies whether the response will report the time edits were applied. If true, the server will return the time edits were applied in the response’s edit moment key. This parameter applies only if the isDataBranchVersioned property of the layer is true.
# Usage Example 1:

print(fl.calculate(where="OBJECTID < 2",
                   calc_expression={"field": "ZONE", "value" : "R1"}))
# Usage Example 2:

print(fl.calculate(where="OBJECTID < 2001",
                   calc_expression={"field": "A",  "sqlExpression" : "B*3"}))

Output: dictionary with format {‘updatedFeatureCount’: 1, ‘success’: True}

container

The feature layer collection to which this layer belongs.

delete_features(deletes=None, where=None, geometry_filter=None, gdb_version=None, rollback_on_failure=True)

This operation deletes features in a feature layer or table

Argument Description
deletes Optional string. A comma seperated string of OIDs to remove from the service.
where Optional string. A where clause for the query filter. Any legal SQL where clause operating on the fields in the layer is allowed. Features conforming to the specified where clause will be deleted.
geometry_filter Optional SpatialFilter. A spatial filter from arcgis.geometry.filters module to filter results by a spatial relationship with another geometry.
gdb_version Optional string. A Geodatabase version to apply the edits.
rollback_on_failure Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.
Returns:dict
edit_features(adds=None, updates=None, deletes=None, gdb_version=None, use_global_ids=False, rollback_on_failure=True)

This operation adds, updates, and deletes features to the associated feature layer or table in a single call.

Inputs Description
adds Optional FeatureSet/List. The array of features to be added.
updates Optional FeatureSet/List. The array of features to be updated.
deletes Optional FeatureSet/List. string of OIDs to remove from service
use_global_ids Optional boolean. Instead of referencing the default Object ID field, the service will look at a GUID field to track changes. This means the GUIDs will be passed instead of OIDs for delete, update or add features.
gdb_version Optional boolean. Geodatabase version to apply the edits.
rollback_on_failure Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.

Output: dictionary

classmethod fromitem(item, layer_id=0)

Creates a feature layer from a GIS Item. The type of item should be a ‘Feature Service’ that represents a FeatureLayerCollection. The layer_id is the id of the layer in feature layer collection (feature service).

generate_renderer(definition, where=None)

This operation groups data using the supplied definition (classification definition) and an optional where clause. The result is a renderer object. Use baseSymbol and colorRamp to define the symbols assigned to each class. If the operation is performed on a table, the result is a renderer object containing the data classes and no symbols.

Argument Description
definition required dict. The definition using the renderer that is generated. Use either class breaks or unique value classification definitions. See: https://resources.arcgis.com/en/help/rest/apiref/ms_classification.html
where optional string. A where clause for which the data needs to be classified. Any legal SQL where clause operating on the fields in the dynamic layer/table is allowed.
Returns:dictionary
get_html_popup(oid)

The htmlPopup resource provides details about the HTML pop-up authored by the user using ArcGIS for Desktop.

Argument Description
oid Optional string. Object id of the feature to get the HTML popup.
Returns:string
get_unique_values(attribute, query_string='1=1')

Return a list of unique values for a given attribute

Argument Description
attribute Required string. The feature layer attribute to query.
query_string Optional string. SQL Query that will be used to filter attributes before unique values are returned. ex. “name_2 like ‘%K%’”
manager

Helper object to manage the feature layer, update it’s definition, etc

metadata

The metadata property allows for the setting and downloading of the Feature Layer’s metadata. If metadata is disabled on the GIS or the layer does not support metdata, None value will be returned.

Argument Description
value Required String. (SET) Path to the metadata file.
Returns:String (GET)
properties

The properties of this object

query(where='1=1', out_fields='*', time_filter=None, geometry_filter=None, return_geometry=True, return_count_only=False, return_ids_only=False, return_distinct_values=False, return_extent_only=False, group_by_fields_for_statistics=None, statistic_filter=None, result_offset=None, result_record_count=None, object_ids=None, distance=None, units=None, max_allowable_offset=None, out_sr=None, geometry_precision=None, gdb_version=None, order_by_fields=None, out_statistics=None, return_z=False, return_m=False, multipatch_option=None, quantization_parameters=None, return_centroid=False, return_all_records=True, result_type=None, historic_moment=None, sql_format=None, return_true_curves=False, return_exceeded_limit_features=None, as_df=False, **kwargs)

Queries a feature layer based on a sql statement

Argument Description
where Optional string. The default is 1=1. The selection sql statement.
out_fields Optional List of field names to return. Field names can be specified either as a List of field names or as a comma separated string. The default is “*”, which returns all the fields.
object_ids Optional string. The object IDs of this layer or table to be queried. The object ID values should be a comma-separated string.
distance Optional integer. The buffer distance for the input geometries. The distance unit is specified by units. For example, if the distance is 100, the query geometry is a point, units is set to meters, and all points within 100 meters of the point are returned.
units

Optional string. The unit for calculating the buffer distance. If unit is not specified, the unit is derived from the geometry spatial reference. If the geometry spatial reference is not specified, the unit is derived from the feature service data spatial reference. This parameter only applies if supportsQueryWithDistance is true. Values: esriSRUnit_Meter | esriSRUnit_StatuteMile |

esriSRUnit_Foot | esriSRUnit_Kilometer | esriSRUnit_NauticalMile | esriSRUnit_USNauticalMile
time_filter

Optional list. The format is of [<startTime>, <endTime>] using datetime.date, datetime.datetime or timestamp in milliseconds. Syntax: time_filter=[<startTime>, <endTime>] ; specified as

datetime.date, datetime.datetime or timestamp in milliseconds
geometry_filter Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.
max_allowable_offset Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of out_sr. If out_sr is not specified, max_allowable_offset is assumed to be in the unit of the spatial reference of the layer.
out_sr Optional Integer. The WKID for the spatial reference of the returned geometry.
geometry_precision Optional Integer. This option can be used to specify the number of decimal places in the response geometries returned by the query operation. This applies to X and Y values only (not m or z-values).
gdb_version Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer is true. If this is not specified, the query will apply to the published map’s version.
return_geometry Optional boolean. If true, geometry is returned with the query. Default is true.
return_distinct_values Optional boolean. If true, it returns distinct values based on the fields specified in out_fields. This parameter applies only if the supportsAdvancedQueries property of the layer is true.
return_ids_only Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.
return_count_only Optional boolean. If true, the response only includes the count (number of features/records) that would be returned by a query. Otherwise, the response is a feature set. The default is false. This option supersedes the returnIdsOnly parameter. If returnCountOnly = true, the response will return both the count and the extent.
return_extent_only Optional boolean. If true, the response only includes the extent of the features that would be returned by the query. If returnCountOnly=true, the response will return both the count and the extent. The default is false. This parameter applies only if the supportsReturningQueryExtent property of the layer is true.
order_by_fields Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER
group_by_fields_for_statistics Optional string. One or more field names on which the values need to be grouped for calculating the statistics. example: STATE_NAME, GENDER
out_statistics

Optional string. The definitions for one or more field-based statistics to be calculated.

Syntax:

[
{
“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field1”, “outStatisticFieldName”: “Out_Field_Name1”

}, {

“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field2”, “outStatisticFieldName”: “Out_Field_Name2”

}

]

return_z Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is False.
return_m Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.
multipatch_option Optional x/y footprint. This option dictates how the geometry of a multipatch feature will be returned.
result_offset Optional integer. This option can be used for fetching query results by skipping the specified number of records and starting from the next record (that is, resultOffset + 1th). This option is ignored if return_all_records is True (i.e. by default).
result_record_count Optional integer. This option can be used for fetching query results up to the result_record_count specified. When result_offset is specified but this parameter is not, the map service defaults it to max_record_count. The maximum value for this parameter is the value of the layer’s max_record_count property. This option is ignored if return_all_records is True (i.e. by default).
quantization_parameters Optional dict. Used to project the geometry onto a virtual grid, likely representing pixels on the screen.
return_centroid Optional boolean. Used to return the geometry centroid associated with each feature returned. If true, the result includes the geometry centroid. The default is false.
return_all_records Optional boolean. When True, the query operation will call the service until all records that satisfy the where_clause are returned. Note: result_offset and result_record_count will be ignored if return_all_records is True. Also, if return_count_only, return_ids_only, or return_extent_only are True, this parameter will be ignored.
result_type Optional string. The result_type parameter can be used to control the number of features returned by the query operation. Values: None | standard | tile
historic_moment

Optional integer. The historic moment to query. This parameter applies only if the layer is archiving enabled and the supportsQueryWithHistoricMoment property is set to true. This property is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

sql_format Optional string. The sql_format parameter can be either standard SQL92 standard or it can use the native SQL of the underlying datastore native. The default is none which means the sql_format depends on useStandardizedQuery parameter. Values: none | standard | native
return_true_curves Optional boolean. When set to true, returns true curves in output geometries. When set to false, curves are converted to densified polylines or polygons.
return_exceeded_limit_features

Optional boolean. Optional parameter which is true by default. When set to true, features are returned even when the results include ‘exceededTransferLimit’: True.

When set to false and querying with resultType = tile features are not returned when the results include ‘exceededTransferLimit’: True. This allows a client to find the resolution in which the transfer limit is no longer exceeded without making multiple calls.

as_df Optional boolean. If True, the results are returned as a DataFrame instead of a FeatureSet.
kwargs Optional dict. Optional parameters that can be passed to the Query function. This will allow users to pass additional parameters not explicitly implemented on the function. A complete list of functions available is documented on the Query REST API.
Returns:A FeatureSet containing the features matching the query unless another return type is specified, such as count

The Query operation is performed on a feature service layer resource. The result of this operation are feature sets grouped by source layer/table object IDs. Each feature set contains Feature objects including the values for the fields requested by the user. For related layers, if you request geometry information, the geometry of each feature is also returned in the feature set. For related tables, the feature set does not include geometries.

Argument Description
object_ids Required string. The object IDs of the table/layer to be queried
relationship_id Required string. The ID of the relationship to be queried.
out_fields Required string. the list of fields from the related table/layer to be included in the returned feature set. This list is a comma delimited list of field names. If you specify the shape field in the list of return fields, it is ignored. To request geometry, set return_geometry to true. You can also specify the wildcard “*” as the value of this parameter. In this case, the results will include all the field values.
definition_expression Optional string. The definition expression to be applied to the related table/layer. From the list of objectIds, only those records that conform to this expression are queried for related records.
return_geometry Optional boolean. If true, the feature set includes the geometry associated with each feature. The default is true.
max_allowable_offset Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of the outSR. If out_wkid is not specified, then max_allowable_offset is assumed to be in the unit of the spatial reference of the map.
geometry_precision Optional integer. This option can be used to specify the number of decimal places in the response geometries.
out_wkid Optional Integer. The spatial reference of the returned geometry.
gdb_version Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer queried is true.
return_z Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is false.
return_m Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.
historic_moment

Optional Integer/datetime. The historic moment to query. This parameter applies only if the supportsQueryWithHistoricMoment property of the layers being queried is set to true. This setting is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

Syntax: historic_moment=<Epoch time in milliseconds>

return_true_curves Optional boolean. Optional parameter that is false by default. When set to true, returns true curves in output geometries; otherwise, curves are converted to densified polylines or polygons.
Returns:dict
query_top_features(top_filter=None, where=None, objectids=None, start_time=None, end_time=None, geometry_filter=None, out_fields='*', return_geometry=True, return_centroid=False, max_allowable_offset=None, out_sr=None, geometry_precision=None, return_ids_only=False, return_extents_only=False, order_by_field=None, return_z=False, return_m=False, result_type=None, as_df=True)

The query_top_features is performed on a feature layer. This operation returns a feature set or spatially enabled dataframe based on the top features by order within a group. For example, when querying counties in the United States, you want to return the top five counties by population in each state. To do this, you can use query_top_feaures to group by state name, order by desc on the population and return the first five rows from each group (state).

The top_filter parameter is used to set the group by, order by, and count criteria used in generating the result. The operation also has many of the same parameters (for example, where and geometry) as the layer query operation. However, unlike the layer query operation, query_top_feaures does not support parameters such as outStatistics and its related parameters or return distinct values. Consult the advancedQueryCapabilities layer property for more details.

If the feature layer collection supports the query_top_feaures operation, it will include “supportsTopFeaturesQuery”: true, in the advancedQueryCapabilities layer property.

Argument Description
top_filter

Required Dict. The top_filter define the aggregation of the data.

  • groupByFields define the field or fields used to aggregate
    your data.
  • topCount defines the number of features returned from the top
    features query and is a numeric value.
  • orderByFields defines the order in which the top features will
    be returned. orderByFields can be specified in either ascending (asc) or descending (desc) order, ascending being the default.
Example: {“groupByFields”: “worker”, “topCount”: 1,
“orderByFields”: “employeeNumber”}
where Optional String. A WHERE clause for the query filter. SQL ‘92 WHERE clause syntax on the fields in the layer is supported for most data sources.
objectids Optional List. The object IDs of the layer or table to be queried.
start_time Optional Datetime. The starting time to query for.
end_time Optional Datetime. The end date to query for.
geometry_filter Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.
out_fields Optional String. The list of fields to include in the return results.
return_geometry Optional Boolean. If False, the query will not return geometries. The default is True.
return_centroid Optional Boolean. If True, the centroid of the geometry will be added to the output.
max_allowable_offset Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of out_sr. If out_sr is not specified, max_allowable_offset is assumed to be in the unit of the spatial reference of the layer.
out_sr Optional Integer. The WKID for the spatial reference of the returned geometry.
geometry_precision Optional Integer. This option can be used to specify the number of decimal places in the response geometries returned by the query operation. This applies to X and Y values only (not m or z-values).
return_ids_only Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.
return_extent_only Optional boolean. If true, the response only includes the extent of the features that would be returned by the query. If returnCountOnly=true, the response will return both the count and the extent. The default is false. This parameter applies only if the supportsReturningQueryExtent property of the layer is true.
order_by_field Optional Str. Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER
return_z Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is False.
return_m Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.
result_type Optional String. The result_type can be used to control the number of features returned by the query operation. Values: none | standard | tile
as_df Optional Boolean. If False, the result is returned as a FeatureSet. If True (default) the result is returned as a spatially enabled dataframe.
Returns:Default - pd.DataFrame, when as_df=False returns a FeatureSet. If return_count_only is True, the return type is Integer. If the return_ids_only is True, a list of value is returned.
update_metadata(file_path)

Updates a Layer’s metadata from an xml file.

Argument Description
file_path Required String. The path to the .xml file that contains the metadata.
Returns:boolean
validate_sql(sql, sql_type='where')

The validate_sql operation validates an SQL-92 expression or WHERE clause. The validate_sql operation ensures that an SQL-92 expression, such as one written by a user through a user interface, is correct before performing another operation that uses the expression. For example, validateSQL can be used to validate information that is subsequently passed in as part of the where parameter of the calculate operation. validate_sql also prevents SQL injection. In addition, all table and field names used in the SQL expression or WHERE clause are validated to ensure they are valid tables and fields.

Argument Description
sql Required String. The SQL expression of WHERE clause to validate. Example: “Population > 300000”
sql_type
Optional String. Three SQL types are supported in validate_sql
  • where (default) - Represents the custom WHERE clause the user can compose when querying a layer or using calculate.
  • expression - Represents an SQL-92 expression. Currently, expression is used as a default value expression when adding a new field or using the calculate API.
  • statement - Represents the full SQL-92 statement that can be passed directly to the database. No current ArcGIS REST API resource or operation supports using the full SQL-92 SELECT statement directly. It has been added to the validateSQL for completeness. Values: where | expression | statement
Returns:dict

arcgis.features.Table

class arcgis.features.Table(url, gis=None, container=None, dynamic_layer=None)

Tables represent entity classes with uniform properties. In addition to working with “entities with location” as features, the GIS can also work with non-spatial entities as rows in tables.

Working with tables is similar to working with feature layers, except that the rows (Features) in a table do not have a geometry, and tables ignore any geometry related operation.

append(item_id=None, upload_format='featureCollection', source_table_name=None, field_mappings=None, edits=None, source_info=None, upsert=True, skip_updates=False, use_globalids=False, update_geometry=True, append_fields=None, rollback=False, skip_inserts=None, upsert_matching_field=None)

Only available in ArcGIS Online

Update an existing hosted feature layer using append.

Argument Description
source_table_name optional string. Required only when the source data contains more than one tables, e.g., for file geodatabase. Example: source_table_name= “Building”
item_id optional string. The ID for the Portal item that contains the source file. Used in conjunction with editsUploadFormat.
field_mappings

optional list. Used to map source data to a destination layer. Syntax: fieldMappings=[{“name” : <”targerName”>,

“sourceName” : < “sourceName”>}, …]
Examples: fieldMappings=[{“name” : “CountyID”,
“sourceName” : “GEOID10”}]
edits optional string. Only feature collection json is supported. Append supports all format through the upload_id or item_id.
source_info optional dictionary. This is only needed when appending data from excel or csv. The appendSourceInfo can be the publishing parameter returned from analyze the csv or excel file.
upsert optional boolean. Optional parameter specifying whether the edits needs to be applied as updates if the feature already exists. Default is false.
skip_updates Optional boolean. Parameter is used only when upsert is true.
use_globalids Optional boolean. Specifying whether upsert needs to use GlobalId when matching features.
update_geometry Optional boolean. The parameter is used only when upsert is true. Skip updating the geometry and update only the attributes for existing features if they match source features by objectId or globalId.(as specified by useGlobalIds parameter).
append_fields Optional list. The list of destination fields to append to. This is supported when upsert=true or false. Values: [“fieldName1”, “fieldName2”,….]
upload_format required string. The source append data format. The default is featureCollection format. Values: sqlite | shapefile | filegdb | featureCollection | geojson | csv | excel
rollback Optional boolean. Optional parameter specifying whether the upsert edits needs to be rolled back in case of failure. Default is false.
skip_inserts Used only when upsert is true. Used to skip inserts if the value is true. The default value is false.
upsert_matching_field Optional string. The layer field to be used when matching features with upsert. ObjectId, GlobalId, and any other field that has a unique index can be used with upsert. This parameter overrides use_globalids; e.g., specifying upsert_matching_field will be used even if you specify use_globalids = True. Example: upsert_matching_field=”MyfieldWithUniqueIndex”
Returns:boolean
calculate(where, calc_expression, sql_format='standard', version=None, sessionid=None, return_edit_moment=None)

The calculate operation is performed on a feature layer resource. It updates the values of one or more fields in an existing feature service layer based on SQL expressions or scalar values. The calculate operation can only be used if the supportsCalculate property of the layer is true. Neither the Shape field nor system fields can be updated using calculate. System fields include ObjectId and GlobalId. See Calculate a field for more information on supported expressions

Inputs Description
where Required String. A where clause can be used to limit the updated records. Any legal SQL where clause operating on the fields in the layer is allowed.
calc_expression

Required List. The array of field/value info objects that contain the field or fields to update and their scalar values or SQL expression. Allowed types are dictionary and list. List must be a list of dictionary objects.

Calculation Format is as follows:

{“field” : “<field name>”, “value” : “<value>”}
sql_format

Optional String. The SQL format for the calc_expression. It can be either standard SQL92 (standard) or native SQL (native). The default is standard.

Values: standard, native

version Optional String. The geodatabase version to apply the edits.
sessionid Optional String. A parameter which is set by a client during long transaction editing on a branch version. The sessionid is a GUID value that clients establish at the beginning and use throughout the edit session. The sessonid ensures isolation during the edit session. This parameter applies only if the isDataBranchVersioned property of the layer is true.
return_edit_moment Optional Boolean. This parameter specifies whether the response will report the time edits were applied. If true, the server will return the time edits were applied in the response’s edit moment key. This parameter applies only if the isDataBranchVersioned property of the layer is true.
# Usage Example 1:

print(fl.calculate(where="OBJECTID < 2",
                   calc_expression={"field": "ZONE", "value" : "R1"}))
# Usage Example 2:

print(fl.calculate(where="OBJECTID < 2001",
                   calc_expression={"field": "A",  "sqlExpression" : "B*3"}))

Output: dictionary with format {‘updatedFeatureCount’: 1, ‘success’: True}

container

The feature layer collection to which this layer belongs.

delete_features(deletes=None, where=None, geometry_filter=None, gdb_version=None, rollback_on_failure=True)

This operation deletes features in a feature layer or table

Argument Description
deletes Optional string. A comma seperated string of OIDs to remove from the service.
where Optional string. A where clause for the query filter. Any legal SQL where clause operating on the fields in the layer is allowed. Features conforming to the specified where clause will be deleted.
geometry_filter Optional SpatialFilter. A spatial filter from arcgis.geometry.filters module to filter results by a spatial relationship with another geometry.
gdb_version Optional string. A Geodatabase version to apply the edits.
rollback_on_failure Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.
Returns:dict
edit_features(adds=None, updates=None, deletes=None, gdb_version=None, use_global_ids=False, rollback_on_failure=True)

This operation adds, updates, and deletes features to the associated feature layer or table in a single call.

Inputs Description
adds Optional FeatureSet/List. The array of features to be added.
updates Optional FeatureSet/List. The array of features to be updated.
deletes Optional FeatureSet/List. string of OIDs to remove from service
use_global_ids Optional boolean. Instead of referencing the default Object ID field, the service will look at a GUID field to track changes. This means the GUIDs will be passed instead of OIDs for delete, update or add features.
gdb_version Optional boolean. Geodatabase version to apply the edits.
rollback_on_failure Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.

Output: dictionary

classmethod fromitem(item, table_id=0)

Creates a Table from a GIS Item. The type of item should be a ‘Feature Service’ that represents a FeatureLayerCollection. The layer_id is the id of the layer in feature layer collection (feature service).

generate_renderer(definition, where=None)

This operation groups data using the supplied definition (classification definition) and an optional where clause. The result is a renderer object. Use baseSymbol and colorRamp to define the symbols assigned to each class. If the operation is performed on a table, the result is a renderer object containing the data classes and no symbols.

Argument Description
definition required dict. The definition using the renderer that is generated. Use either class breaks or unique value classification definitions. See: https://resources.arcgis.com/en/help/rest/apiref/ms_classification.html
where optional string. A where clause for which the data needs to be classified. Any legal SQL where clause operating on the fields in the dynamic layer/table is allowed.
Returns:dictionary
get_html_popup(oid)

The htmlPopup resource provides details about the HTML pop-up authored by the user using ArcGIS for Desktop.

Argument Description
oid Optional string. Object id of the feature to get the HTML popup.
Returns:string
get_unique_values(attribute, query_string='1=1')

Return a list of unique values for a given attribute

Argument Description
attribute Required string. The feature layer attribute to query.
query_string Optional string. SQL Query that will be used to filter attributes before unique values are returned. ex. “name_2 like ‘%K%’”
manager

Helper object to manage the feature layer, update it’s definition, etc

metadata

The metadata property allows for the setting and downloading of the Feature Layer’s metadata. If metadata is disabled on the GIS or the layer does not support metdata, None value will be returned.

Argument Description
value Required String. (SET) Path to the metadata file.
Returns:String (GET)
properties

The properties of this object

query(where='1=1', out_fields='*', time_filter=None, geometry_filter=None, return_geometry=True, return_count_only=False, return_ids_only=False, return_distinct_values=False, return_extent_only=False, group_by_fields_for_statistics=None, statistic_filter=None, result_offset=None, result_record_count=None, object_ids=None, distance=None, units=None, max_allowable_offset=None, out_sr=None, geometry_precision=None, gdb_version=None, order_by_fields=None, out_statistics=None, return_z=False, return_m=False, multipatch_option=None, quantization_parameters=None, return_centroid=False, return_all_records=True, result_type=None, historic_moment=None, sql_format=None, return_true_curves=False, return_exceeded_limit_features=None, as_df=False, **kwargs)

Queries a feature layer based on a sql statement

Argument Description
where Optional string. The default is 1=1. The selection sql statement.
out_fields Optional List of field names to return. Field names can be specified either as a List of field names or as a comma separated string. The default is “*”, which returns all the fields.
object_ids Optional string. The object IDs of this layer or table to be queried. The object ID values should be a comma-separated string.
distance Optional integer. The buffer distance for the input geometries. The distance unit is specified by units. For example, if the distance is 100, the query geometry is a point, units is set to meters, and all points within 100 meters of the point are returned.
units

Optional string. The unit for calculating the buffer distance. If unit is not specified, the unit is derived from the geometry spatial reference. If the geometry spatial reference is not specified, the unit is derived from the feature service data spatial reference. This parameter only applies if supportsQueryWithDistance is true. Values: esriSRUnit_Meter | esriSRUnit_StatuteMile |

esriSRUnit_Foot | esriSRUnit_Kilometer | esriSRUnit_NauticalMile | esriSRUnit_USNauticalMile
time_filter

Optional list. The format is of [<startTime>, <endTime>] using datetime.date, datetime.datetime or timestamp in milliseconds. Syntax: time_filter=[<startTime>, <endTime>] ; specified as

datetime.date, datetime.datetime or timestamp in milliseconds
geometry_filter Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.
max_allowable_offset Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of out_sr. If out_sr is not specified, max_allowable_offset is assumed to be in the unit of the spatial reference of the layer.
out_sr Optional Integer. The WKID for the spatial reference of the returned geometry.
geometry_precision Optional Integer. This option can be used to specify the number of decimal places in the response geometries returned by the query operation. This applies to X and Y values only (not m or z-values).
gdb_version Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer is true. If this is not specified, the query will apply to the published map’s version.
return_geometry Optional boolean. If true, geometry is returned with the query. Default is true.
return_distinct_values Optional boolean. If true, it returns distinct values based on the fields specified in out_fields. This parameter applies only if the supportsAdvancedQueries property of the layer is true.
return_ids_only Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.
return_count_only Optional boolean. If true, the response only includes the count (number of features/records) that would be returned by a query. Otherwise, the response is a feature set. The default is false. This option supersedes the returnIdsOnly parameter. If returnCountOnly = true, the response will return both the count and the extent.
return_extent_only Optional boolean. If true, the response only includes the extent of the features that would be returned by the query. If returnCountOnly=true, the response will return both the count and the extent. The default is false. This parameter applies only if the supportsReturningQueryExtent property of the layer is true.
order_by_fields Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER
group_by_fields_for_statistics Optional string. One or more field names on which the values need to be grouped for calculating the statistics. example: STATE_NAME, GENDER
out_statistics

Optional string. The definitions for one or more field-based statistics to be calculated.

Syntax:

[
{
“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field1”, “outStatisticFieldName”: “Out_Field_Name1”

}, {

“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field2”, “outStatisticFieldName”: “Out_Field_Name2”

}

]

return_z Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is False.
return_m Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.
multipatch_option Optional x/y footprint. This option dictates how the geometry of a multipatch feature will be returned.
result_offset Optional integer. This option can be used for fetching query results by skipping the specified number of records and starting from the next record (that is, resultOffset + 1th). This option is ignored if return_all_records is True (i.e. by default).
result_record_count Optional integer. This option can be used for fetching query results up to the result_record_count specified. When result_offset is specified but this parameter is not, the map service defaults it to max_record_count. The maximum value for this parameter is the value of the layer’s max_record_count property. This option is ignored if return_all_records is True (i.e. by default).
quantization_parameters Optional dict. Used to project the geometry onto a virtual grid, likely representing pixels on the screen.
return_centroid Optional boolean. Used to return the geometry centroid associated with each feature returned. If true, the result includes the geometry centroid. The default is false.
return_all_records Optional boolean. When True, the query operation will call the service until all records that satisfy the where_clause are returned. Note: result_offset and result_record_count will be ignored if return_all_records is True. Also, if return_count_only, return_ids_only, or return_extent_only are True, this parameter will be ignored.
result_type Optional string. The result_type parameter can be used to control the number of features returned by the query operation. Values: None | standard | tile
historic_moment

Optional integer. The historic moment to query. This parameter applies only if the layer is archiving enabled and the supportsQueryWithHistoricMoment property is set to true. This property is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

sql_format Optional string. The sql_format parameter can be either standard SQL92 standard or it can use the native SQL of the underlying datastore native. The default is none which means the sql_format depends on useStandardizedQuery parameter. Values: none | standard | native
return_true_curves Optional boolean. When set to true, returns true curves in output geometries. When set to false, curves are converted to densified polylines or polygons.
return_exceeded_limit_features

Optional boolean. Optional parameter which is true by default. When set to true, features are returned even when the results include ‘exceededTransferLimit’: True.

When set to false and querying with resultType = tile features are not returned when the results include ‘exceededTransferLimit’: True. This allows a client to find the resolution in which the transfer limit is no longer exceeded without making multiple calls.

as_df Optional boolean. If True, the results are returned as a DataFrame instead of a FeatureSet.
kwargs Optional dict. Optional parameters that can be passed to the Query function. This will allow users to pass additional parameters not explicitly implemented on the function. A complete list of functions available is documented on the Query REST API.
Returns:A FeatureSet containing the features matching the query unless another return type is specified, such as count

The Query operation is performed on a feature service layer resource. The result of this operation are feature sets grouped by source layer/table object IDs. Each feature set contains Feature objects including the values for the fields requested by the user. For related layers, if you request geometry information, the geometry of each feature is also returned in the feature set. For related tables, the feature set does not include geometries.

Argument Description
object_ids Required string. The object IDs of the table/layer to be queried
relationship_id Required string. The ID of the relationship to be queried.
out_fields Required string. the list of fields from the related table/layer to be included in the returned feature set. This list is a comma delimited list of field names. If you specify the shape field in the list of return fields, it is ignored. To request geometry, set return_geometry to true. You can also specify the wildcard “*” as the value of this parameter. In this case, the results will include all the field values.
definition_expression Optional string. The definition expression to be applied to the related table/layer. From the list of objectIds, only those records that conform to this expression are queried for related records.
return_geometry Optional boolean. If true, the feature set includes the geometry associated with each feature. The default is true.
max_allowable_offset Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of the outSR. If out_wkid is not specified, then max_allowable_offset is assumed to be in the unit of the spatial reference of the map.
geometry_precision Optional integer. This option can be used to specify the number of decimal places in the response geometries.
out_wkid Optional Integer. The spatial reference of the returned geometry.
gdb_version Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer queried is true.
return_z Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is false.
return_m Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.
historic_moment

Optional Integer/datetime. The historic moment to query. This parameter applies only if the supportsQueryWithHistoricMoment property of the layers being queried is set to true. This setting is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

Syntax: historic_moment=<Epoch time in milliseconds>

return_true_curves Optional boolean. Optional parameter that is false by default. When set to true, returns true curves in output geometries; otherwise, curves are converted to densified polylines or polygons.
Returns:dict
query_top_features(top_filter=None, where=None, objectids=None, start_time=None, end_time=None, geometry_filter=None, out_fields='*', return_geometry=True, return_centroid=False, max_allowable_offset=None, out_sr=None, geometry_precision=None, return_ids_only=False, return_extents_only=False, order_by_field=None, return_z=False, return_m=False, result_type=None, as_df=True)

The query_top_features is performed on a feature layer. This operation returns a feature set or spatially enabled dataframe based on the top features by order within a group. For example, when querying counties in the United States, you want to return the top five counties by population in each state. To do this, you can use query_top_feaures to group by state name, order by desc on the population and return the first five rows from each group (state).

The top_filter parameter is used to set the group by, order by, and count criteria used in generating the result. The operation also has many of the same parameters (for example, where and geometry) as the layer query operation. However, unlike the layer query operation, query_top_feaures does not support parameters such as outStatistics and its related parameters or return distinct values. Consult the advancedQueryCapabilities layer property for more details.

If the feature layer collection supports the query_top_feaures operation, it will include “supportsTopFeaturesQuery”: true, in the advancedQueryCapabilities layer property.

Argument Description
top_filter

Required Dict. The top_filter define the aggregation of the data.

  • groupByFields define the field or fields used to aggregate
    your data.
  • topCount defines the number of features returned from the top
    features query and is a numeric value.
  • orderByFields defines the order in which the top features will
    be returned. orderByFields can be specified in either ascending (asc) or descending (desc) order, ascending being the default.
Example: {“groupByFields”: “worker”, “topCount”: 1,
“orderByFields”: “employeeNumber”}
where Optional String. A WHERE clause for the query filter. SQL ‘92 WHERE clause syntax on the fields in the layer is supported for most data sources.
objectids Optional List. The object IDs of the layer or table to be queried.
start_time Optional Datetime. The starting time to query for.
end_time Optional Datetime. The end date to query for.
geometry_filter Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.
out_fields Optional String. The list of fields to include in the return results.
return_geometry Optional Boolean. If False, the query will not return geometries. The default is True.
return_centroid Optional Boolean. If True, the centroid of the geometry will be added to the output.
max_allowable_offset Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of out_sr. If out_sr is not specified, max_allowable_offset is assumed to be in the unit of the spatial reference of the layer.
out_sr Optional Integer. The WKID for the spatial reference of the returned geometry.
geometry_precision Optional Integer. This option can be used to specify the number of decimal places in the response geometries returned by the query operation. This applies to X and Y values only (not m or z-values).
return_ids_only Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.
return_extent_only Optional boolean. If true, the response only includes the extent of the features that would be returned by the query. If returnCountOnly=true, the response will return both the count and the extent. The default is false. This parameter applies only if the supportsReturningQueryExtent property of the layer is true.
order_by_field Optional Str. Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER
return_z Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is False.
return_m Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.
result_type Optional String. The result_type can be used to control the number of features returned by the query operation. Values: none | standard | tile
as_df Optional Boolean. If False, the result is returned as a FeatureSet. If True (default) the result is returned as a spatially enabled dataframe.
Returns:Default - pd.DataFrame, when as_df=False returns a FeatureSet. If return_count_only is True, the return type is Integer. If the return_ids_only is True, a list of value is returned.
update_metadata(file_path)

Updates a Layer’s metadata from an xml file.

Argument Description
file_path Required String. The path to the .xml file that contains the metadata.
Returns:boolean
validate_sql(sql, sql_type='where')

The validate_sql operation validates an SQL-92 expression or WHERE clause. The validate_sql operation ensures that an SQL-92 expression, such as one written by a user through a user interface, is correct before performing another operation that uses the expression. For example, validateSQL can be used to validate information that is subsequently passed in as part of the where parameter of the calculate operation. validate_sql also prevents SQL injection. In addition, all table and field names used in the SQL expression or WHERE clause are validated to ensure they are valid tables and fields.

Argument Description
sql Required String. The SQL expression of WHERE clause to validate. Example: “Population > 300000”
sql_type
Optional String. Three SQL types are supported in validate_sql
  • where (default) - Represents the custom WHERE clause the user can compose when querying a layer or using calculate.
  • expression - Represents an SQL-92 expression. Currently, expression is used as a default value expression when adding a new field or using the calculate API.
  • statement - Represents the full SQL-92 statement that can be passed directly to the database. No current ArcGIS REST API resource or operation supports using the full SQL-92 SELECT statement directly. It has been added to the validateSQL for completeness. Values: where | expression | statement
Returns:dict

arcgis.features.FeatureLayerCollection

class arcgis.features.FeatureLayerCollection(url, gis=None)

A FeatureLayerCollection is a collection of feature layers and tables, with the associated relationships among the entities.

In a web GIS, a feature layer collection is exposed as a feature service with multiple feature layers.

Instances of FeatureDatasets can be obtained from feature service Items in the GIS using FeatureLayerCollection.fromitem(item), from feature service endpoints using the constructor, or by accessing the dataset attribute of feature layer objects.

FeatureDatasets can be configured and managed using their manager helper object.

If the dataset supports the sync operation, the replicas helper object allows management and synchronization of replicas for disconnected editing of the feature layer collection.

Note: You can use the layers and tables property to get to the individual layers and tables in this feature layer collection.

extract_changes(layers, servergen, queries=None, geometry=None, geometry_type=None, version=None, return_inserts=False, return_updates=False, return_deletes=False, return_ids_only=False, return_extent_only=False, return_attachments=False, attachments_by_url=False, data_format='json', change_extent_grid_cell=None)

Feature service change tracking is an efficient change tracking mechanism for applications. Applications can use change tracking to query changes that have been made to the layers and tables in the service. For enterprise geodatabase based feature services published from ArcGIS Pro 2.2 or higher, the ChangeTracking capability requires all layers and tables to be either archive enabled or branch versioned and have globalid columns. Change tracking can also be enabled for ArcGIS Online hosted feature services. If all layers and tables in the service have the ChangeTracking capability, the extract_changes operation can be used to get changes.

fromitem(item)
manager

helper object to manage the feature layer collection, update it’s definition, etc

properties

The properties of this object

query(layer_defs_filter=None, geometry_filter=None, time_filter=None, return_geometry=True, return_ids_only=False, return_count_only=False, return_z=False, return_m=False, out_sr=None)

queries the feature layer collection

query_domains(layers)

The query_domains returns full domain information for the domains referenced by the layers in the feature layer collection. This operation is performed on a feature layer collection. The operation takes an array of layer IDs and returns the set of domains referenced by the layers.

Argument Description
layers Required List. An array of layers. The set of domains to return is based on the domains referenced by these layers. Example: [1,2,3,4]
Returns:list of dictionaries

The Query operation is performed on a feature service layer resource. The result of this operation are feature sets grouped by source layer/table object IDs. Each feature set contains Feature objects including the values for the fields requested by the user. For related layers, if you request geometry information, the geometry of each feature is also returned in the feature set. For related tables, the feature set does not include geometries.

Argument Description
object_ids Optional string. the object IDs of the table/layer to be queried.
relationship_id Optional string. The ID of the relationship to be queried.
out_fields Optional string.the list of fields from the related table/layer to be included in the returned feature set. This list is a comma delimited list of field names. If you specify the shape field in the list of return fields, it is ignored. To request geometry, set return_geometry to true. You can also specify the wildcard “*” as the value of this parameter. In this case, the results will include all the field values.
definition_expression Optional string. The definition expression to be applied to the related table/layer. From the list of objectIds, only those records that conform to this expression are queried for related records.
return_geometry Optional boolean. If true, the feature set includes the geometry associated with each feature. The default is true.
max_allowable_offset Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of the outSR. If outSR is not specified, then max_allowable_offset is assumed to be in the unit of the spatial reference of the map.
geometry_precision Optional integer. This option can be used to specify the number of decimal places in the response geometries.
out_wkid Optional integer. The spatial reference of the returned geometry.
gdb_version Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer queried is true.
return_z Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is false.
return_m Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.
Returns:dict
relationships

The relationships property provides relationship information for the layers and tables in the feature layer collection.

The relationships resource includes information about relationship rules from the back-end relationship classes, in addition to the relationship information already found in the individual layers and tables.

Feature layer collections that support the relationships resource will have the “supportsRelationshipsResource”: true property on their properties.

Returns:List of Dictionaries
upload(path, description=None)

Uploads a new item to the server. Once the operation is completed successfully, the JSON structure of the uploaded item is returned.

Argument Description
path Optional string. Filepath of the file to upload.
description Optional string. Descriptive text for the uploaded item.
Returns:boolean
versions

Returns a VersionManager to create, update and use versions on a FeatureLayerCollection. If versioning is not enabled on the service, None is returned.

arcgis.features.FeatureSet

class arcgis.features.FeatureSet(features, fields=None, has_z=False, has_m=False, geometry_type=None, spatial_reference=None, display_field_name=None, object_id_field_name=None, global_id_field_name=None)

A set of features with information about their fields, field aliases, geometry type, spatial reference etc.

FeatureSets are commonly used as input/output with several Geoprocessing Tools, and can be the obtained through the query() methods of feature layers. A FeatureSet can be combined with a layer definition to compose a FeatureCollection.

FeatureSet contains Feature objects, including the values for the fields requested by the user. For layers, if you request geometry information, the geometry of each feature is also returned in the FeatureSet. For tables, the FeatureSet does not include geometries.

If a Spatial Reference is not specified at the FeatureSet level, the FeatureSet will assume the SpatialReference of its first feature. If the SpatialReference of the first feature is also not specified, the spatial reference will be UnknownCoordinateSystem.

df

deprecated in v1.5.0 please use `sdf`

converts the FeatureSet to a Pandas dataframe. Requires pandas

display_field_name

gets/sets the displayFieldName

features

gets the features in the FeatureSet

fields

gets the fields in the FeatureSet

static from_dataframe(df)

returns a featureset from a Pandas’ Data or Spatial DataFrame

static from_dict(featureset_dict)

returns a featureset from a dict

static from_geojson(geojson)

Converts a GeoJSON Feature Collection into a FeatureSet

static from_json(json_str)

returns a featureset from a JSON string

geometry_type

gets/sets the geometry Type

global_id_field_name

gets/sets the globalIdFieldName

has_m

gets/set the M-property

has_z

gets/sets the Z-property

object_id_field_name

gets/sets the object id field

save(save_location, out_name, encoding=None)

Saves a featureset object to a feature class

Argument Description
save_location Required string. Path to export the FeatureSet to.
out_name Required string. Name of the saved table.
encoding Optional string. character encoding is used to represent a repertoire of characters by some kind of encoding system. The default is None.
Returns:string
sdf

Converts the FeatureSet to a Spatially Enabled Pandas dataframe

spatial_reference

gets the featureset’s spatial reference

to_dict()

converts the object to Python dictionary

to_geojson

converts the object to GeoJSON

to_json

converts the object to JSON

value

returns object as dictionary

arcgis.features.FeatureCollection

class arcgis.features.FeatureCollection(dictdata)

FeatureCollection is an object with a layer definition and a feature set.

It is an in-memory collection of features with rendering information.

Feature Collections can be stored as Items in the GIS, added as layers to a map or scene, passed as inputs to feature analysis tools, and returned as results from feature analysis tools if an output name for a feature layer is not specified when calling the tool.

static from_featureset(fset, symbol=None, name=None)

Create a FeatureCollection object from a FeatureSet object.

Returns:A FeatureCollection object.
query()

Returns the data in this feature collection as a FeatureSet. Filtering by where clause is not supported for feature collections

arcgis.features.GeoAccessor

class arcgis.features.GeoAccessor(obj)

The DataFrame Accessor is a namespace that performs dataset operations. This includes visualization, spatial indexing, IO and dataset level properties.

area

Returns the total area of the dataframe

Returns:float
>>> df.spatial.area
143.23427
bbox

Returns the total length of the dataframe

Returns:Polygon
>>> df.spatial.bbox
{'rings' : [[[1,2], [2,3], [3,3],....]], 'spatialReference' {'wkid': 4326}}
centroid

Returns the centroid of the dataframe

Returns:Geometry
>>> df.spatial.centroid
(-14.23427, 39)
distance_matrix(leaf_size=16, rebuild=False)

Creates a k-d tree to calculate the nearest-neighbor problem.

requires scipy

Argument Description
leafsize Optional Integer. The number of points at which the algorithm switches over to brute-force. Default: 16.
rebuild Optional Boolean. If True, the current KDTree is erased. If false, any KD-Tree that exists will be returned.
Returns:scipy’s KDTree class
static from_df(df, address_column='address', geocoder=None, sr=None)

Returns a SpatialDataFrame from a dataframe with an address column.

Argument Description
df Required Pandas DataFrame. Source dataset
address_column Optional String. The default is “address”. This is the name of a column in the specified dataframe that contains addresses (as strings). The addresses are batch geocoded using the GIS’s first configured geocoder and their locations used as the geometry of the spatial dataframe. Ignored if the ‘geometry’ parameter is also specified.
geocoder Optional Geocoder. The geocoder to be used. If not specified, the active GIS’s first geocoder is used.
sr Optional integer. The WKID of the spatial reference.
Returns:DataFrame

NOTE: Credits will be consumed for batch_geocoding, from the GIS to which the geocoder belongs.

static from_featureclass(location, **kwargs)

Returns a Spatially enabled pandas.DataFrame from a feature class.

Argument Description
location Required string. Full path to the feature class

Optional parameters when ArcPy library is available in the current environment:

Key Value
sql_clause sql clause to parse data down. To learn more see ArcPy Search Cursor
where_clause where statement. To learn more see ArcPy SQL reference
fields list of strings specifying the field names.
Returns:pandas.core.frame.DataFrame
static from_layer(layer)

Imports a FeatureLayer to a Spatially Enabled DataFrame

This operation converts a FeatureLayer or TableLayer to a Pandas’ DataFrame

Argument Description
layer Required FeatureLayer or TableLayer. The service to convert to a Spatially enabled DataFrame.

Usage:

>>> from arcgis.features import FeatureLayer
>>> mylayer = FeatureLayer(("https://sampleserver6.arcgisonline.com/arcgis/rest"
                    "/services/CommercialDamageAssessment/FeatureServer/0"))
>>> df = from_layer(mylayer)
>>> print(df.head())
Returns:Pandas’ DataFrame
static from_table(filename, **kwargs)

Allows a user to read from a non-spatial table

Note: ArcPy is Required for this method

Argument Description
filename Required string. The path to the table.

Keyword Arguments

Argument Description
fields

Optional List/Tuple. A list (or tuple) of field names. For a single field, you can use a string instead of a list of strings.

Use an asterisk (*) instead of a list of fields if you want to access all fields from the input table (raster and BLOB fields are excluded). However, for faster performance and reliable field order, it is recommended that the list of fields be narrowed to only those that are actually needed.

Geometry, raster, and BLOB fields are not supported.

where Optional String. An optional expression that limits the records returned.
skip_nulls Optional Boolean. This controls whether records using nulls are skipped.
null_value Optional String/Integer/Float. Replaces null values from the input with a new value.
Returns:pd.DataFrame
static from_xy(df, x_column, y_column, sr=4326)

Converts a Pandas DataFrame into a Spatial DataFrame by providing the X/Y columns.

Argument Description
df Required Pandas DataFrame. Source dataset
x_column Required string. The name of the X-coordinate series
y_column Required string. The name of the Y-coordinate series
sr Optional int. The wkid number of the spatial reference. 4326 is the default value.
Returns:DataFrame
full_extent

Returns the extent of the dataframe

Returns:tuple
>>> df.spatial.full_extent
(-118, 32, -97, 33)
geometry_type

Returns a list Geometry Types for the DataFrame

join(right_df, how='inner', op='intersects', left_tag='left', right_tag='right')

Joins the current DataFrame to another spatially enabled dataframes based on spatial location based.

Note

requires the SEDF to be in the same coordinate system

Argument Description
right_df Required pd.DataFrame. Spatially enabled dataframe to join.
how

Required string. The type of join:

  • left - use keys from current dataframe and retains only current geometry column
  • right - use keys from right_df; retain only right_df geometry column
  • inner - use intersection of keys from both dfs and retain only current geometry column
op

Required string. The operation to use to perform the join. The default is intersects.

supported perations: intersects, within, and contains

left_tag Optional String. If the same column is in the left and right dataframe, this will append that string value to the field.
right_tag Optional String. If the same column is in the left and right dataframe, this will append that string value to the field.
Returns:Spatially enabled Pandas’ DataFrame
length

Returns the total length of the dataframe

Returns:float
>>> df.spatial.length
1.23427
name

returns the name of the geometry column

overlay(sdf, op='union')

Performs spatial operation operations on two spatially enabled dataframes.

requires ArcPy or Shapely

Argument Description
sdf Required Spatially Enabled DataFrame. The geometry to perform the operation from.
op Optional String. The spatial operation to perform. The allowed value are: union, erase, identity, intersection. union is the default operation.
Returns:Spatially enabled DataFrame (pd.DataFrame)
plot(map_widget=None, **kwargs)

Plot draws the data on a web map. The user can describe in simple terms how to renderer spatial data using symbol. To make the process simplier a pallette for which colors are drawn from can be used instead of explicit colors.

Explicit Argument Description
map_widget optional WebMap object. This is the map to display the data on.
palette optional string/dict. Color mapping. For simple renderer, just provide a string. For more robust renderers like unique renderer, a dictionary can be given.
renderer_type

optional string. Determines the type of renderer to use for the provided dataset. The default is ‘s’ which is for simple renderers.

Allowed values:

  • ‘s’ - is a simple renderer that uses one symbol only.
  • ‘u’ - unique renderer symbolizes features based on one
    or more matching string attributes.
  • ‘c’ - A class breaks renderer symbolizes based on the
    value of some numeric attribute.
  • ‘h’ - heatmap renders point data into a raster
    visualization that emphasizes areas of higher density or weighted values.
symbol_type optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.
symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Allowed symbol types based on geometries:

Point Symbols

  • ‘o’ - Circle (default)
  • ‘+’ - Cross
  • ‘D’ - Diamond
  • ‘s’ - Square
  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)
  • ‘-‘ - Dash
  • ‘-.’ - Dash Dot
  • ‘-..’ - Dash Dot Dot
  • ‘.’ - Dot
  • ‘–’ - Long Dash
  • ‘–.’ - Long Dash Dot
  • ‘n’ - Null
  • ‘s-‘ - Short Dash
  • ‘s-.’ - Short Dash Dot
  • ‘s-..’ - Short Dash Dot Dot
  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)
  • ‘’ - Backward Diagonal
  • ‘/’ - Forward Diagonal
  • ‘|’ - Vertical Bar
  • ‘-‘ - Horizontal Bar
  • ‘x’ - Diagonal Cross
  • ‘+’ - Cross
col optional string/list. Field or fields used for heatmap, class breaks, or unique renderers.
pallette optional string. The color map to draw from in order to visualize the data. The default pallette is ‘jet’. To get a visual representation of the allowed color maps, use the display_colormaps method.
alpha optional float. This is a value between 0 and 1 with 1 being the default value. The alpha sets the transparancy of the renderer when applicable.

** Render Syntax **

The render syntax allows for users to fully customize symbolizing the data.

** Simple Renderer**

A simple renderer is a renderer that uses one symbol only.

Optional Argument Description
symbol_type optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.
symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Point Symbols

  • ‘o’ - Circle (default)
  • ‘+’ - Cross
  • ‘D’ - Diamond
  • ‘s’ - Square
  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)
  • ‘-‘ - Dash
  • ‘-.’ - Dash Dot
  • ‘-..’ - Dash Dot Dot
  • ‘.’ - Dot
  • ‘–’ - Long Dash
  • ‘–.’ - Long Dash Dot
  • ‘n’ - Null
  • ‘s-‘ - Short Dash
  • ‘s-.’ - Short Dash Dot
  • ‘s-..’ - Short Dash Dot Dot
  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)
  • ‘’ - Backward Diagonal
  • ‘/’ - Forward Diagonal
  • ‘|’ - Vertical Bar
  • ‘-‘ - Horizontal Bar
  • ‘x’ - Diagonal Cross
  • ‘+’ - Cross
description Description of the renderer.
rotation_expression A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets.
rotation_type

String value which controls the origin and direction of rotation on point features. If the rotationType is defined as arithmetic, the symbol is rotated from East in a counter-clockwise direction where East is the 0 degree axis. If the rotationType is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis.

Must be one of the following values:

  • arithmetic
  • geographic
visual_variables An array of objects used to set rendering properties.

Heatmap Renderer

The HeatmapRenderer renders point data into a raster visualization that emphasizes areas of higher density or weighted values.

Optional Argument Description
blur_radius The radius (in pixels) of the circle over which the majority of each point’s value is spread.
field This is optional as this renderer can be created if no field is specified. Each feature gets the same value/importance/weight or with a field where each feature is weighted by the field’s value.
max_intensity The pixel intensity value which is assigned the final color in the color ramp.
min_intensity The pixel intensity value which is assigned the initial color in the color ramp.
ratio A number between 0-1. Describes what portion along the gradient the colorStop is added.

Unique Renderer

This renderer symbolizes features based on one or more matching string attributes.

Optional Argument Description
background_fill_symbol A symbol used for polygon features as a background if the renderer uses point symbols, e.g. for bivariate types & size rendering. Only applicable to polygon layers. PictureFillSymbols can also be used outside of the Map Viewer for Size and Predominance and Size renderers.
default_label Default label for the default symbol used to draw unspecified values.
default_symbol Symbol used when a value cannot be matched.
field1, field2, field3 Attribute field renderer uses to match values.
field_delimiter String inserted between the values if multiple attribute fields are specified.
rotation_expression A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets. Rotation is set using a visual variable of type rotation info with a specified field or value expression property.
rotation_type

String property which controls the origin and direction of rotation. If the rotation type is defined as arithmetic the symbol is rotated from East in a counter-clockwise direction where East is the 0 degree axis. If the rotation type is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis. Must be one of the following values:

  • arithmetic
  • geographic
arcade_expression An Arcade expression evaluating to either a string or a number.
arcade_title The title identifying and describing the associated Arcade expression as defined in the valueExpression property.
visual_variables An array of objects used to set rendering properties.

Class Breaks Renderer

A class breaks renderer symbolizes based on the value of some numeric attribute.

Optional Argument Description
background_fill_symbol A symbol used for polygon features as a background if the renderer uses point symbols, e.g. for bivariate types & size rendering. Only applicable to polygon layers. PictureFillSymbols can also be used outside of the Map Viewer for Size and Predominance and Size renderers.
default_label Default label for the default symbol used to draw unspecified values.
default_symbol Symbol used when a value cannot be matched.
method

Determines the classification method that was used to generate class breaks.

Must be one of the following values:

  • esriClassifyDefinedInterval
  • esriClassifyEqualInterval
  • esriClassifyGeometricalInterval
  • esriClassifyNaturalBreaks
  • esriClassifyQuantile
  • esriClassifyStandardDeviation
  • esriClassifyManual
field Attribute field used for renderer.
min_value The minimum numeric data value needed to begin class breaks.
normalization_field Used when normalizationType is field. The string value indicating the attribute field by which the data value is normalized.
normalization_total Used when normalizationType is percent-of-total, this number property contains the total of all data values.
normalization_type

Determine how the data was normalized.

Must be one of the following values:

  • esriNormalizeByField
  • esriNormalizeByLog
  • esriNormalizeByPercentOfTotal
rotation_expression A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets.
rotation_type

A string property which controls the origin and direction of rotation. If the rotation_type is defined as arithmetic, the symbol is rotated from East in a couter-clockwise direction where East is the 0 degree axis. If the rotationType is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis.

Must be one of the following values:

  • arithmetic
  • geographic
arcade_expression An Arcade expression evaluating to a number.
arcade_title The title identifying and describing the associated Arcade expression as defined in the arcade_expression property.
visual_variables An object used to set rendering options.

** Symbol Syntax **

Optional Argument Description
symbol_type optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.
symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Point Symbols

  • ‘o’ - Circle (default)
  • ‘+’ - Cross
  • ‘D’ - Diamond
  • ‘s’ - Square
  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)
  • ‘-‘ - Dash
  • ‘-.’ - Dash Dot
  • ‘-..’ - Dash Dot Dot
  • ‘.’ - Dot
  • ‘–’ - Long Dash
  • ‘–.’ - Long Dash Dot
  • ‘n’ - Null
  • ‘s-‘ - Short Dash
  • ‘s-.’ - Short Dash Dot
  • ‘s-..’ - Short Dash Dot Dot
  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)
  • ‘’ - Backward Diagonal
  • ‘/’ - Forward Diagonal
  • ‘|’ - Vertical Bar
  • ‘-‘ - Horizontal Bar
  • ‘x’ - Diagonal Cross
  • ‘+’ - Cross
cmap optional string or list. This is the color scheme a user can provide if the exact color is not needed, or a user can provide a list with the color defined as: [red, green blue, alpha]. The values red, green, blue are from 0-255 and alpha is a float value from 0 - 1. The default value is ‘jet’ color scheme.
cstep optional integer. If provided, its the color location on the color scheme.

Simple Symbols

This is a list of optional parameters that can be given for point, line or polygon geometries.

Argument Description
marker_size optional float. Numeric size of the symbol given in points.
marker_angle optional float. Numeric value used to rotate the symbol. The symbol is rotated counter-clockwise. For example, The following, angle=-30, in will create a symbol rotated -30 degrees counter-clockwise; that is, 30 degrees clockwise.
marker_xoffset Numeric value indicating the offset on the x-axis in points.
marker_yoffset Numeric value indicating the offset on the y-axis in points.
line_width optional float. Numeric value indicating the width of the line in points
outline_style

Optional string. For polygon point, and line geometries , a customized outline type can be provided.

Allowed Styles:

  • ‘s’ - Solid (default)
  • ‘-‘ - Dash
  • ‘-.’ - Dash Dot
  • ‘-..’ - Dash Dot Dot
  • ‘.’ - Dot
  • ‘–’ - Long Dash
  • ‘–.’ - Long Dash Dot
  • ‘n’ - Null
  • ‘s-‘ - Short Dash
  • ‘s-.’ - Short Dash Dot
  • ‘s-..’ - Short Dash Dot Dot
  • ‘s.’ - Short Dot
outline_color optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.

Picture Symbol

This type of symbol only applies to Points, MultiPoints and Polygons.

Argument Description
marker_angle Numeric value that defines the number of degrees ranging from 0-360, that a marker symbol is rotated. The rotation is from East in a counter-clockwise direction where East is the 0 axis.
marker_xoffset Numeric value indicating the offset on the x-axis in points.
marker_yoffset Numeric value indicating the offset on the y-axis in points.
height Numeric value used if needing to resize the symbol. Specify a value in points. If images are to be displayed in their original size, leave this blank.
width Numeric value used if needing to resize the symbol. Specify a value in points. If images are to be displayed in their original size, leave this blank.
url String value indicating the URL of the image. The URL should be relative if working with static layers. A full URL should be used for map service dynamic layers. A relative URL can be dereferenced by accessing the map layer image resource or the feature layer image resource.
image_data String value indicating the base64 encoded data.
xscale Numeric value indicating the scale factor in x direction.
yscale Numeric value indicating the scale factor in y direction.
outline_color optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.
outline_style

Optional string. For polygon point, and line geometries , a customized outline type can be provided.

Allowed Styles:

  • ‘s’ - Solid (default)
  • ‘-‘ - Dash
  • ‘-.’ - Dash Dot
  • ‘-..’ - Dash Dot Dot
  • ‘.’ - Dot
  • ‘–’ - Long Dash
  • ‘–.’ - Long Dash Dot
  • ‘n’ - Null
  • ‘s-‘ - Short Dash
  • ‘s-.’ - Short Dash Dot
  • ‘s-..’ - Short Dash Dot Dot
  • ‘s.’ - Short Dot
outline_color optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.
line_width optional float. Numeric value indicating the width of the line in points

Text Symbol

This type of symbol only applies to Points, MultiPoints and Polygons.

Argument Description
font_decoration The text decoration. Must be one of the following values: - line-through - underline - none
font_family Optional string. The font family.
font_size Optional float. The font size in points.
font_style Optional string. The text style. - italic - normal - oblique
font_weight Optional string. The text weight. Must be one of the following values: - bold - bolder - lighter - normal
background_color optional string/list. Background color is represented as a four-element array or string of a color map.
halo_color Optional string/list. Color of the halo around the text. The default is None.
halo_size Optional integer/float. The point size of a halo around the text symbol.
horizontal_alignment optional string. One of the following string values representing the horizontal alignment of the text. Must be one of the following values: - left - right - center - justify
kerning optional boolean. Boolean value indicating whether to adjust the spacing between characters in the text string.
line_color optional string/list. Outline color is represented as a four-element array or string of a color map.
line_width optional integer/float. Outline size.
marker_angle optional int. A numeric value that defines the number of degrees (0 to 360) that a text symbol is rotated. The rotation is from East in a counter-clockwise direction where East is the 0 axis.
marker_xoffset optional int/float.Numeric value indicating the offset on the x-axis in points.
marker_yoffset optional int/float.Numeric value indicating the offset on the x-axis in points.
right_to_left optional boolean. Set to true if using Hebrew or Arabic fonts.
rotated optional boolean. Boolean value indicating whether every character in the text string is rotated.
text Required string. Text Value to display next to geometry.
vertical_alignment Optional string. One of the following string values representing the vertical alignment of the text. Must be one of the following values: - top - bottom - middle - baseline

Cartographic Symbol

This type of symbol only applies to line geometries.

Argument Description
line_width optional float. Numeric value indicating the width of the line in points
cap Optional string. The cap style.
join Optional string. The join style.
miter_limit Optional string. Size threshold for showing mitered line joins.

The kwargs parameter accepts all parameters of the create_symbol method and the create_renderer method.

project(spatial_reference, transformation_name=None)

Reprojects the who dataset into a new spatial reference. This is an inplace operation meaning that it will update the defined geometry column from the set_geometry.

Argument Description
spatial_reference Required SpatialReference. The new spatial reference. This can be a SpatialReference object or the coordinate system name.
transformation_name Optional String. The geotransformation name.
Returns:boolean
relationship(other, op, relation=None)

This method allows for dataframe to dataframe compairson using spatial relationships. The return is a pd.DataFrame that meet the operations’ requirements.

Argument Description
sdf Required Spatially Enabled DataFrame. The geometry to perform the operation from.
op

Optional String. The spatial operation to perform. The allowed value are: contains,crosses,disjoint,equals, overlaps,touches, or within.

  • contains - Indicates if the base geometry contains the comparison geometry.
  • crosses - Indicates if the two geometries intersect in a geometry of a lesser shape type.
  • disjoint - Indicates if the base and comparison geometries share no points in common.
  • equals - Indicates if the base and comparison geometries are of the same shape type and define the same set of points in the plane. This is a 2D comparison only; M and Z values are ignored.
  • overlaps - Indicates if the intersection of the two geometries has the same shape type as one of the input geometries and is not equivalent to either of the input geometries.
  • touches - Indicates if the boundaries of the geometries intersect.
  • within - Indicates if the base geometry is within the comparison geometry.
relation

Optional String. The spatial relationship type. The allowed values are: BOUNDARY, CLEMENTINI, and PROPER.

  • BOUNDARY - Relationship has no restrictions for interiors or boundaries.
  • CLEMENTINI - Interiors of geometries must intersect. This is the default.
  • PROPER - Boundaries of geometries must not intersect.

This only applies to contains,

Returns:Spatially enabled DataFrame (pd.DataFrame)
select(other)

This operation performs a dataset wide selection by geometric intersection. A geometry or another Spatially enabled DataFrame can be given and select will return all rows that intersect that input geometry. The select operation uses a spatial index to complete the task, so if it is not built before the first run, the function will build a quadtree index on the fly.

requires ArcPy or Shapely

Returns:pd.DataFrame (spatially enabled)
set_geometry(col, sr=None)

Assigns the Geometry Column by Name or by List

sindex(stype='quadtree', reset=False, **kwargs)

Creates a spatial index for the given dataset.

By default the spatial index is a QuadTree spatial index.

If r-tree indexes should be used for large datasets. This will allow users to create very large out of memory indexes. To use r-tree indexes, the r-tree library must be installed. To do so, install via conda using the following command: conda install -c conda-forge rtree

sr

gets/sets the spatial reference of the dataframe

to_feature_collection(name=None, drawing_info=None, extent=None, global_id_field=None)

Converts a spatially enabled pd.DataFrame to a Feature Collection

optional argument Description
name optional string. Name of the Feature Collection
drawing_info Optional dictionary. This is the rendering information for a Feature Collection. Rendering information is a dictionary with the symbology, labelling and other properties defined. See: http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Renderer_objects/02r30000019t000000/
extent Optional dictionary. If desired, a custom extent can be provided to set where the map starts up when showing the data. The default is the full extent of the dataset in the Spatial DataFrame.
global_id_field Optional string. The Global ID field of the dataset.
Returns:FeatureCollection object
to_featureclass(location, overwrite=True)

exports a geo enabled dataframe to a feature class.

to_featurelayer(title, gis=None, tags=None)

publishes a spatial dataframe to a new feature layer

Argument Description
title Required string. The name of the service
gis Optional GIS. The GIS connection object
tags Optional list of strings. A comma seperated list of descriptive words for the service.
Returns:FeatureLayer
to_featureset()

Converts a spatial dataframe to a feature set object

to_table(location, overwrite=True)

Exports a geo enabled dataframe to a table.

Argument Description
location Required string. The output of the table.
overwrite Optional Boolean. If True and if the table exists, it will be deleted and overwritten. This is default. If False, the table and the table exists, and exception will be raised.
Returns:String
true_centroid

Returns the true centroid of the dataframe

Returns:Geometry
>>> df.spatial.true_centroid
(1.23427, 34)
validate(strict=False)

Determines if the Geo Accessor is Valid with Geometries in all values

voronoi()

Generates a voronoi diagram on the whole dataset. If the geometry is not a Point then the centroid is used for the geometry. The result is a polygon GeoArray Series that matches 1:1 to the original dataset.

requires scipy

Returns:pd.Series

arcgis.features.GeoSeriesAccessor

class arcgis.features.GeoSeriesAccessor(obj)
JSON

Returns JSON string of Geometry

Returns:Series of strings
WKB

Returns the Geometry as WKB

Returns:Series of Bytes
WKT

Returns the Geometry as WKT

Returns:Series of String
angle_distance_to(second_geometry, method='GEODESIC')

Returns a tuple of angle and distance to another point using a measurement type.

Argument Description
second_geometry Required Geometry. A arcgis.Geometry object.
method Optional String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.
Returns:a tuple of angle and distance to another point using a measurement type.
area

Returns the features area

Returns:float in a series
as_arcpy

Returns the features as ArcPy Geometry

Returns:arcpy.Geometry in a series
as_shapely

Returns the features as Shapely Geometry

Returns:shapely.Geometry in a series
boundary()

Constructs the boundary of the geometry.

Returns:arcgis.geometry.Polyline
buffer(distance)

Constructs a polygon at a specified distance from the geometry.

Argument Description
distance Required float. The buffer distance. The buffer distance is in the same units as the geometry that is being buffered. A negative distance can only be specified against a polygon geometry.
Returns:arcgis.geometry.Polygon
centroid

Returns the feature’s centroid

Returns:tuple (x,y) in series
clip(envelope)

Constructs the intersection of the geometry and the specified extent.

Argument Description
envelope required tuple. The tuple must have (XMin, YMin, XMax, YMax) each value represents the lower left bound and upper right bound of the extent.
Returns:output geometry clipped to extent
contains(second_geometry, relation=None)

Indicates if the base geometry contains the comparison geometry.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
relation

Optional string. The spatial relationship type.

  • BOUNDARY - Relationship has no restrictions for interiors or boundaries.
  • CLEMENTINI - Interiors of geometries must intersect. Specifying CLEMENTINI is equivalent to specifying None. This is the default.
  • PROPER - Boundaries of geometries must not intersect.
Returns:boolean
convex_hull()

Constructs the geometry that is the minimal bounding polygon such that all outer angles are convex.

crosses(second_geometry)

Indicates if the two geometries intersect in a geometry of a lesser shape type.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:boolean
cut(cutter)

Splits this geometry into a part left of the cutting polyline, and a part right of it.

Argument Description
cutter Required Polyline. The cuttin polyline geometry
Returns:a list of two geometries
densify(method, distance, deviation)

Creates a new geometry with added vertices

Argument Description
method Required String. The type of densification, DISTANCE, ANGLE, or GEODESIC
distance Required float. The maximum distance between vertices. The actual distance between vertices will usually be less than the maximum distance as new vertices will be evenly distributed along the original segment. If using a type of DISTANCE or ANGLE, the distance is measured in the units of the geometry’s spatial reference. If using a type of GEODESIC, the distance is measured in meters.
deviation Required float. Densify uses straight lines to approximate curves. You use deviation to control the accuracy of this approximation. The deviation is the maximum distance between the new segment and the original curve. The smaller its value, the more segments will be required to approximate the curve.
Returns:arcgis.geometry.Geometry
difference(second_geometry)

Constructs the geometry that is composed only of the region unique to the base geometry but not part of the other geometry. The following illustration shows the results when the red polygon is the source geometry.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:arcgis.geometry.Geometry
disjoint(second_geometry)

Indicates if the base and comparison geometries share no points in common.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:boolean
distance_to(second_geometry)

Returns the minimum distance between two geometries. If the geometries intersect, the minimum distance is 0. Both geometries must have the same projection.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:float
equals(second_geometry)

Indicates if the base and comparison geometries are of the same shape type and define the same set of points in the plane. This is a 2D comparison only; M and Z values are ignored.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:boolean
extent

Returns the feature’s extent

Returns:tuple (xmin,ymin,xmax,ymax) in series
first_point

Returns the feature’s first point

Returns:Geometry
generalize(max_offset)

Creates a new simplified geometry using a specified maximum offset tolerance.

Argument Description
max_offset Required float. The maximum offset tolerance.
Returns:arcgis.geometry.Geometry
geoextent

A returns the geometry’s extents

Returns:Series of Floats
geometry_type

returns the geometry types

Returns:Series of strings
get_area(method, units=None)

Returns the area of the feature using a measurement type.

Argument Description
method Required String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.
units Optional String. Areal unit of measure keywords: ACRES | ARES | HECTARES | SQUARECENTIMETERS | SQUAREDECIMETERS | SQUAREINCHES | SQUAREFEET | SQUAREKILOMETERS | SQUAREMETERS | SQUAREMILES | SQUAREMILLIMETERS | SQUAREYARDS
Returns:float
get_length(method, units)

Returns the length of the feature using a measurement type.

Argument Description
method Required String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.
units Required String. Linear unit of measure keywords: CENTIMETERS | DECIMETERS | FEET | INCHES | KILOMETERS | METERS | MILES | MILLIMETERS | NAUTICALMILES | YARDS
Returns:float
get_part(index=None)

Returns an array of point objects for a particular part of geometry or an array containing a number of arrays, one for each part.

requires arcpy

Argument Description
index Required Integer. The index position of the geometry.
Returns:arcpy.Array
hull_rectangle

A space-delimited string of the coordinate pairs of the convex hull

Returns:Series of strings
intersect(second_geometry, dimension=1)

Constructs a geometry that is the geometric intersection of the two input geometries. Different dimension values can be used to create different shape types. The intersection of two geometries of the same shape type is a geometry containing only the regions of overlap between the original geometries.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
dimension

Required Integer. The topological dimension (shape type) of the resulting geometry.

  • 1 -A zero-dimensional geometry (point or multipoint).
  • 2 -A one-dimensional geometry (polyline).
  • 4 -A two-dimensional geometry (polygon).
Returns:boolean
is_empty

Returns True/False if feature is empty

Returns:Series of Booleans
is_multipart

Returns True/False if features has multiple parts

Returns:Series of Booleans
is_valid

Returns True/False if features geometry is valid

Returns:Series of Booleans
label_point

Returns the geometry point for the optimal label location

Returns:Series of Geometries
last_point

Returns the Geometry of the last point in a feature.

Returns:Series of Geometry
length

Returns the length of the features

Returns:Series of float
length3D

Returns the length of the features

Returns:Series of float
measure_on_line(second_geometry, as_percentage=False)

Returns a measure from the start point of this line to the in_point.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
as_percentage Optional Boolean. If False, the measure will be returned as a distance; if True, the measure will be returned as a percentage.
Returns:float
overlaps(second_geometry)

Indicates if the intersection of the two geometries has the same shape type as one of the input geometries and is not equivalent to either of the input geometries.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:boolean
part_count

Returns the number of parts in a feature’s geometry

Returns:Series of Integer
point_count

Returns the number of points in a feature’s geometry

Returns:Series of Integer
point_from_angle_and_distance(angle, distance, method='GEODESCIC')

Returns a point at a given angle and distance in degrees and meters using the specified measurement type.

Argument Description
angle Required Float. The angle in degrees to the returned point.
distance Required Float. The distance in meters to the returned point.
method Optional String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.
Returns:arcgis.geometry.Geometry
position_along_line(value, use_percentage=False)

Returns a point on a line at a specified distance from the beginning of the line.

Argument Description
value Required Float. The distance along the line.
use_percentage Optional Boolean. The distance may be specified as a fixed unit of measure or a ratio of the length of the line. If True, value is used as a percentage; if False, value is used as a distance. For percentages, the value should be expressed as a double from 0.0 (0%) to 1.0 (100%).
Returns:Geometry
project_as(spatial_reference, transformation_name=None)

Projects a geometry and optionally applies a geotransformation.

Argument Description
spatial_reference Required SpatialReference. The new spatial reference. This can be a SpatialReference object or the coordinate system name.
transformation_name Required String. The geotransformation name.
Returns:arcgis.geometry.Geometry
query_point_and_distance(second_geometry, use_percentage=False)

Finds the point on the polyline nearest to the in_point and the distance between those points. Also returns information about the side of the line the in_point is on as well as the distance along the line where the nearest point occurs.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
as_percentage Optional boolean - if False, the measure will be returned as distance, True, measure will be a percentage
Returns:tuple
segment_along_line(start_measure, end_measure, use_percentage=False)

Returns a Polyline between start and end measures. Similar to Polyline.positionAlongLine but will return a polyline segment between two points on the polyline instead of a single point.

Argument Description
start_measure Required Float. The starting distance from the beginning of the line.
end_measure Required Float. The ending distance from the beginning of the line.
use_percentage Optional Boolean. The start and end measures may be specified as fixed units or as a ratio. If True, start_measure and end_measure are used as a percentage; if False, start_measure and end_measure are used as a distance. For percentages, the measures should be expressed as a double from 0.0 (0 percent) to 1.0 (100 percent).
Returns:Geometry
snap_to_line(second_geometry)

Returns a new point based on in_point snapped to this geometry.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:arcgis.gis.Geometry
spatial_reference

Returns the Spatial Reference of the Geometry

Returns:Series of SpatialReference
symmetric_difference(second_geometry)

Constructs the geometry that is the union of two geometries minus the instersection of those geometries.

The two input geometries must be the same shape type.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:arcgis.gis.Geometry
touches(second_geometry)

Indicates if the boundaries of the geometries intersect.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:boolean
true_centroid

Returns the true centroid of the Geometry

Returns:Series of Points
union(second_geometry)

Constructs the geometry that is the set-theoretic union of the input geometries.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
Returns:arcgis.gis.Geometry
within(second_geometry, relation=None)

Indicates if the base geometry is within the comparison geometry.

Argument Description
second_geometry Required arcgis.geometry.Geometry. A second geometry
relation

Optional String. The spatial relationship type.

  • BOUNDARY - Relationship has no restrictions for interiors or boundaries.
  • CLEMENTINI - Interiors of geometries must intersect. Specifying CLEMENTINI is equivalent to specifying None. This is the default.
  • PROPER - Boundaries of geometries must not intersect.
Returns:boolean

arcgis.features.SpatialDataFrame

class arcgis.features.SpatialDataFrame(*args, **kwargs)

A Spatial Dataframe is an object to manipulate, manage and translate data into new forms of information for users.

Functionality of the Spatial DataFrame is determined by the Geometry Engine available to the object at creation. It will first leverage the arcpy geometry engine, then shapely, then it will create the geometry objects without any engine.

Scenerios

Engine Type Functionality
ArcPy Users will have the full functionality provided by the API.
Shapely

Users get a sub-set of operations, and all properties.

Valid Properties:
 
  • JSON
  • WKT
  • WKB
  • area
  • centroid
  • extent
  • first_point
  • hull_rectangle
  • is_multipart
  • label_point
  • last_point
  • length
  • length3D
  • part_count
  • point_count
  • true_centroid
Valid Functions:
 
  • boundary
  • buffer
  • contains
  • convex_hull
  • crosses
  • difference
  • disjoint
  • distance_to
  • equals
  • generalize
  • intersect
  • overlaps
  • symmetric_difference
  • touches
  • union
  • within

Everything else will return None

No Engine Values will return None by default
Required Parameters:
None
Optional:
param data:panda’s dataframe containing attribute information
param geometry:list/array/geoseries of arcgis.geometry objects
param sr:spatial reference of the dataframe. This can be the factory code, WKT string, arcpy.SpatialReference object, or arcgis.SpatailReference object.
param gis:passing a gis.GIS object set to Pro will ensure arcpy is installed and a full swatch of functionality is available to the end user.
JSON

Returns an Esri JSON representation of the geometry as a string.

T

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

copy : bool, default False
If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with numpy.
DataFrame
The transposed DataFrame.

numpy.transpose : Permute the dimensions of a given array.

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.

Square DataFrame with homogeneous dtype

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
   col1  col2
0     1     3
1     2     4
>>> df1_transposed = df1.T # or df1.transpose()
>>> df1_transposed
      0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

>>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

>>> d2 = {'name': ['Alice', 'Bob'],
...       'score': [9.5, 8],
...       'employed': [False, True],
...       'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
    name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0
>>> df2_transposed = df2.T # or df2.transpose()
>>> df2_transposed
              0     1
name      Alice   Bob
score       9.5     8
employed  False  True
kids          0     0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

>>> df2.dtypes
name         object
score       float64
employed       bool
kids          int64
dtype: object
>>> df2_transposed.dtypes
0    object
1    object
dtype: object
WKB

Returns the well-known binary (WKB) representation for OGC geometry. It provides a portable representation of a geometry value as a contiguous stream of bytes.

WKT

Returns the well-known text (WKT) representation for OGC geometry. It provides a portable representation of a geometry value as a text string.

abs()

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

abs
Series/DataFrame containing the absolute value of each element.

numpy.absolute : Calculate the absolute value element-wise.

For complex inputs, 1.2 + 1j, the absolute value is .

Absolute numeric values in a Series.

>>> s = pd.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64

Absolute numeric values in a Series with complex numbers.

>>> s = pd.Series([1.2 + 1j])
>>> s.abs()
0    1.56205
dtype: float64

Absolute numeric values in a Series with a Timedelta element.

>>> s = pd.Series([pd.Timedelta('1 days')])
>>> s.abs()
0   1 days
dtype: timedelta64[ns]

Select rows with data closest to certain value using argsort (from StackOverflow).

>>> df = pd.DataFrame({
...     'a': [4, 5, 6, 7],
...     'b': [10, 20, 30, 40],
...     'c': [100, 50, -30, -50]
... })
>>> df
     a    b    c
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
>>> df.loc[(df.c - 43).abs().argsort()]
     a    b    c
1    5   20   50
0    4   10  100
2    6   30  -30
3    7   40  -50
add(other, axis='columns', level=None, fill_value=None)

Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
DataFrame
Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
add_prefix(prefix)

Prefix labels with string prefix.

For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.

prefix : str
The string to add before each label.
Series or DataFrame
New Series or DataFrame with updated labels.

Series.add_suffix: Suffix row labels with string suffix. DataFrame.add_suffix: Suffix column labels with string suffix.

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.add_prefix('item_')
item_0    1
item_1    2
item_2    3
item_3    4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4],  'B': [3, 4, 5, 6]})
>>> df
   A  B
0  1  3
1  2  4
2  3  5
3  4  6
>>> df.add_prefix('col_')
     col_A  col_B
0       1       3
1       2       4
2       3       5
3       4       6
add_suffix(suffix)

Suffix labels with string suffix.

For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.

suffix : str
The string to add after each label.
Series or DataFrame
New Series or DataFrame with updated labels.

Series.add_prefix: Prefix row labels with string prefix. DataFrame.add_prefix: Prefix column labels with string prefix.

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.add_suffix('_item')
0_item    1
1_item    2
2_item    3
3_item    4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4],  'B': [3, 4, 5, 6]})
>>> df
   A  B
0  1  3
1  2  4
2  3  5
3  4  6
>>> df.add_suffix('_col')
     A_col  B_col
0       1       3
1       2       4
2       3       5
3       4       6
agg(func, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

New in version 0.20.0.

func : function, str, list or dict

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

  • function
  • string function name
  • list of functions and/or function names, e.g. [np.sum, 'mean']
  • dict of axis labels -> functions, function names or list of such.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args
Positional arguments to pass to func.
**kwargs
Keyword arguments to pass to func.
DataFrame, Series or scalar
if DataFrame.agg is called with a single function, returns a Series if DataFrame.agg is called with several functions, returns a DataFrame if Series.agg is called with single function, returns a scalar if Series.agg is called with several functions, returns a Series

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

DataFrame.apply : Perform any type of operations. DataFrame.transform : Perform transformation type operations. pandas.core.groupby.GroupBy : Perform operations over groups. pandas.core.resample.Resampler : Perform operations over resampled bins. pandas.core.window.Rolling : Perform operations over rolling window. pandas.core.window.Expanding : Perform operations over expanding window. pandas.core.window.EWM : Perform operation over exponential weighted

window.

agg is an alias for aggregate. Use the alias.

A passed user-defined-function will be passed a Series for evaluation.

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
max   NaN  8.0
min   1.0  2.0
sum  12.0  NaN

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
aggregate(func, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

New in version 0.20.0.

func : function, str, list or dict

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

  • function
  • string function name
  • list of functions and/or function names, e.g. [np.sum, 'mean']
  • dict of axis labels -> functions, function names or list of such.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args
Positional arguments to pass to func.
**kwargs
Keyword arguments to pass to func.
DataFrame, Series or scalar
if DataFrame.agg is called with a single function, returns a Series if DataFrame.agg is called with several functions, returns a DataFrame if Series.agg is called with single function, returns a scalar if Series.agg is called with several functions, returns a Series

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

DataFrame.apply : Perform any type of operations. DataFrame.transform : Perform transformation type operations. pandas.core.groupby.GroupBy : Perform operations over groups. pandas.core.resample.Resampler : Perform operations over resampled bins. pandas.core.window.Rolling : Perform operations over rolling window. pandas.core.window.Expanding : Perform operations over expanding window. pandas.core.window.EWM : Perform operation over exponential weighted

window.

agg is an alias for aggregate. Use the alias.

A passed user-defined-function will be passed a Series for evaluation.

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
max   NaN  8.0
min   1.0  2.0
sum  12.0  NaN

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)

Align two objects on their axes with the specified join method for each axis Index.

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None)
level : int or level name, default None
Broadcast across a level, matching Index values on the passed MultiIndex level
copy : boolean, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
limit : int, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
fill_axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Filling axis, method and limit
broadcast_axis : {0 or ‘index’, 1 or ‘columns’}, default None
Broadcast values along this axis, if aligning two objects of different dimensions
(left, right) : (DataFrame, type of other)
Aligned objects
all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

axis : {0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

  • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
  • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
  • None : reduce all axes, return a scalar.
bool_only : bool, default None
Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.
skipna : bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
**kwargs : any, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.
Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Series.all : Return True if all elements are True. DataFrame.any : Return True if one (or more) elements are True.

Series

>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([]).all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True

DataFrames

Create a dataframe from a dictionary.

>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
   col1   col2
0  True   True
1  True  False

Default behaviour checks if column-wise values all return True.

>>> df.all()
col1     True
col2    False
dtype: bool

Specify axis='columns' to check if row-wise values all return True.

>>> df.all(axis='columns')
0     True
1    False
dtype: bool

Or axis=None for whether every value is True.

>>> df.all(axis=None)
False
angle_distance_to(second_geometry, method='GEODESIC')

Returns a tuple of angle and distance to another point using a measurement type.

Paramters:
second_geometry:
 
  • a second geometry
method:
  • PLANAR measurements reflect the projection of geographic

data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True, potentially over an axis.

Returns False unless there at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

axis : {0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

  • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
  • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
  • None : reduce all axes, return a scalar.
bool_only : bool, default None
Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.
skipna : bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
**kwargs : any, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.
Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.

numpy.any : Numpy version of this method. Series.any : Return whether any element is True. Series.all : Return whether all elements are True. DataFrame.any : Return whether any element is True over requested axis. DataFrame.all : Return whether all elements are True over requested axis.

Series

For Series input, the output is a scalar indicating whether any element is True.

>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([]).any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True

DataFrame

Whether each column contains at least one True element (the default).

>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
   A  B  C
0  1  0  0
1  2  2  0
>>> df.any()
A     True
B     True
C    False
dtype: bool

Aggregating over the columns.

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2
>>> df.any(axis='columns')
0    True
1    True
dtype: bool
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
       A  B
0   True  1
1  False  0
>>> df.any(axis='columns')
0    True
1    False
dtype: bool

Aggregating over the entire DataFrame with axis=None.

>>> df.any(axis=None)
True

any for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)
append(other, ignore_index=False, verify_integrity=False, sort=None)

Append rows of other to the end of caller, returning a new object.

Columns in other that are not in the caller are added as new columns.

other : DataFrame or Series/dict-like object, or list of these
The data to append.
ignore_index : boolean, default False
If True, do not use the index labels.
verify_integrity : boolean, default False
If True, raise ValueError on creating index with duplicates.
sort : boolean, default None

Sort columns if the columns of self and other are not aligned. The default sorting is deprecated and will change to not-sorting in a future version of pandas. Explicitly pass sort=True to silence the warning and sort. Explicitly pass sort=False to silence the warning and not sort.

New in version 0.23.0.

appended : DataFrame

pandas.concat : General function to concatenate DataFrame, Series
or Panel objects.

If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.

Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
>>> df
   A  B
0  1  2
1  3  4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
>>> df.append(df2)
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

With ignore_index set to True:

>>> df.append(df2, ignore_index=True)
   A  B
0  1  2
1  3  4
2  5  6
3  7  8

The following, while not recommended methods for generating DataFrames, show two ways to generate a DataFrame from multiple data sources.

Less efficient:

>>> df = pd.DataFrame(columns=['A'])
>>> for i in range(5):
...     df = df.append({'A': i}, ignore_index=True)
>>> df
   A
0  0
1  1
2  2
3  3
4  4

More efficient:

>>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],
...           ignore_index=True)
   A
0  0
1  1
2  2
3  3
4  4
apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

func : function
Function to apply to each column or row.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the function is applied:

  • 0 or ‘index’: apply function to each column.
  • 1 or ‘columns’: apply function to each row.
broadcast : bool, optional

Only relevant for aggregation functions:

  • False or None : returns a Series whose length is the length of the index or the number of columns (based on the axis parameter)
  • True : results will be broadcast to the original shape of the frame, the original index and columns will be retained.

Deprecated since version 0.23.0: This argument will be removed in a future version, replaced by result_type=’broadcast’.

raw : bool, default False
  • False : passes each row or column as a Series to the function.
  • True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
reduce : bool or None, default None

Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce=None (the default), apply’s return value will be guessed by calling func on an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce=True a Series will always be returned, and if reduce=False a DataFrame will always be returned.

Deprecated since version 0.23.0: This argument will be removed in a future version, replaced by result_type='reduce'.

result_type : {‘expand’, ‘reduce’, ‘broadcast’, None}, default None

These only act when axis=1 (columns):

  • ‘expand’ : list-like results will be turned into columns.
  • ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
  • ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

New in version 0.23.0.

args : tuple
Positional arguments to pass to func in addition to the array/series.
**kwds
Additional keyword arguments to pass as keywords arguments to func.

applied : Series or DataFrame

DataFrame.applymap: For elementwise operations. DataFrame.aggregate: Only perform aggregating type operations. DataFrame.transform: Only perform transforming type operations.

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

>>> df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9

Using a numpy universal function (in this case the same as np.sqrt(df)):

>>> df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

Using a reducing function on either axis

>>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64
>>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

Retuning a list-like will result in a Series

>>> df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

Passing result_type=’expand’ will expand list-like results to columns of a Dataframe

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
   0  1
0  1  2
1  1  2
2  1  2

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2

Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
   A  B
0  1  2
1  1  2
2  1  2
applymap(func)

Apply a function to a Dataframe elementwise.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

func : callable
Python function, returns a single value from a single value.
DataFrame
Transformed DataFrame.

DataFrame.apply : Apply a function along input axis of DataFrame.

In the current implementation applymap calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567
>>> df.applymap(lambda x: len(str(x)))
   0  1
0  3  4
1  5  5

Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

>>> df.applymap(lambda x: x**2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

But it’s better to avoid applymap in that case.

>>> df ** 2
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
area

The area of a polygon feature. Empty for all other feature types.

as_arcpy

Returns an Esri ArcPy geometry in a Series

as_blocks(copy=True)

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

Deprecated since version 0.21.0.

NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)

copy : boolean, default True

values : a dict of dtype -> Constructor Types

as_matrix(columns=None)

Convert the frame to its Numpy-array representation.

Deprecated since version 0.23.0: Use DataFrame.values() instead.

columns : list, optional, default:None
If None, return all columns, otherwise, returns specified columns.
values : ndarray
If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object. See Notes.

DataFrame.values

Return is NOT a Numpy-matrix, rather, a Numpy-array.

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcase to int32. By numpy.find_common_type convention, mixing int64 and uint64 will result in a float64 dtype.

This method is provided for backwards compatibility. Generally, it is recommended to use ‘.values’.

as_shapely

Returns a Shapely Geometry Objects in a Series

asfreq(freq, method=None, how=None, normalize=False, fill_value=None)

Convert TimeSeries to specified frequency.

Optionally provide filling method to pad/backfill missing values.

Returns the original data conformed to a new index with the specified frequency. resample is more appropriate if an operation, such as summarization, is necessary to represent the data at the new frequency.

freq : DateOffset object, or string method : {‘backfill’/’bfill’, ‘pad’/’ffill’}, default None

Method to use for filling holes in reindexed Series (note this does not fill NaNs that already were present):

  • ‘pad’ / ‘ffill’: propagate last valid observation forward to next valid
  • ‘backfill’ / ‘bfill’: use NEXT valid observation to fill
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight
fill_value : scalar, optional

Value to use for missing values, applied during upsampling (note this does not fill NaNs that already were present).

New in version 0.20.0.

converted : same type as caller

reindex

To learn more about the frequency strings, please see this link.

Start by creating a series with 4 one minute timestamps.

>>> index = pd.date_range('1/1/2000', periods=4, freq='T')
>>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
>>> df = pd.DataFrame({'s':series})
>>> df
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:01:00    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:03:00    3.0

Upsample the series into 30 second bins.

>>> df.asfreq(freq='30S')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    NaN
2000-01-01 00:03:00    3.0

Upsample again, providing a fill value.

>>> df.asfreq(freq='30S', fill_value=9.0)
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    9.0
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    9.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    9.0
2000-01-01 00:03:00    3.0

Upsample again, providing a method.

>>> df.asfreq(freq='30S', method='bfill')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    2.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    3.0
2000-01-01 00:03:00    3.0
asof(where, subset=None)

Return the last row(s) without any NaNs before where.

The last row (for each element in where, if list) without any NaN is taken. In case of a DataFrame, the last row without NaN considering only the subset of columns (if not None)

New in version 0.19.0: For DataFrame

If there is no good value, NaN is returned for a Series or a Series of NaN values for a DataFrame

where : date or array-like of dates
Date(s) before which the last row(s) are returned.
subset : str or array-like of str, default None
For DataFrame, if not None, only use these columns to check for NaNs.

scalar, Series, or DataFrame

  • scalar : when self is a Series and where is a scalar
  • Series: when self is a Series and where is an array-like, or when self is a DataFrame and where is a scalar
  • DataFrame : when self is a DataFrame and where is an array-like

merge_asof : Perform an asof merge. Similar to left join.

Dates are assumed to be sorted. Raises if this is not the case.

A Series and a scalar where.

>>> s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40])
>>> s
10    1.0
20    2.0
30    NaN
40    4.0
dtype: float64
>>> s.asof(20)
2.0

For a sequence where, a Series is returned. The first value is NaN, because the first element of where is before the first index value.

>>> s.asof([5, 20])
5     NaN
20    2.0
dtype: float64

Missing values are not considered. The following is 2.0, not NaN, even though NaN is at the index location for 30.

>>> s.asof(30)
2.0

Take all columns into consideration

>>> df = pd.DataFrame({'a': [10, 20, 30, 40, 50],
...                    'b': [None, None, None, None, 500]},
...                   index=pd.DatetimeIndex(['2018-02-27 09:01:00',
...                                           '2018-02-27 09:02:00',
...                                           '2018-02-27 09:03:00',
...                                           '2018-02-27 09:04:00',
...                                           '2018-02-27 09:05:00']))
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']))
                      a   b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN

Take a single column into consideration

>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']),
...         subset=['a'])
                         a   b
2018-02-27 09:03:30   30.0 NaN
2018-02-27 09:04:30   40.0 NaN
assign(**kwargs)

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

**kwargs : dict of {str: callable or Series}
The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
DataFrame
A new DataFrame with the new columns in addition to all the existing columns.

Assigning multiple columns within the same assign is possible. For Python 3.6 and above, later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order. For Python 3.5 and below, the order of keyword arguments is not specified, you cannot refer to newly created or modified columns. All items are computed first, and then assigned in alphabetical order.

Changed in version 0.23.0: Keyword argument order is maintained for Python 3.6 and later.

>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
...                   index=['Portland', 'Berkeley'])
>>> df
          temp_c
Portland    17.0
Berkeley    25.0

Where the value is a callable, evaluated on df:

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

In Python 3.6+, you can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
...           temp_k=lambda x: (x['temp_f'] +  459.67) * 5 / 9)
          temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15
astype(dtype, copy=True, errors='raise', **kwargs)

Cast a pandas object to a specified dtype dtype.

dtype : data type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.
copy : bool, default True
Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).
errors : {‘raise’, ‘ignore’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised
  • ignore : suppress exceptions. On error return original object

New in version 0.20.0.

kwargs : keyword arguments to pass on to the constructor

casted : same type as caller

to_datetime : Convert argument to datetime. to_timedelta : Convert argument to timedelta. to_numeric : Convert argument to a numeric type. numpy.ndarray.astype : Cast a numpy array to a specified type.

>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype('int64')
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype('category')
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> cat_dtype = pd.api.types.CategoricalDtype(
...                     categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Note that using copy=False and changing data on a new pandas object may propagate changes:

>>> s1 = pd.Series([1,2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1  # note that s1[0] has changed too
0    10
1     2
dtype: int64
at

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

KeyError
When label does not exist in DataFrame
DataFrame.iat : Access a single value for a row/column pair by integer
position.

DataFrame.loc : Access a group of rows and columns by label(s). Series.at : Access a single value using a label.

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30

Get value at specified row/column pair

>>> df.at[4, 'B']
2

Set value at specified row/column pair

>>> df.at[4, 'B'] = 10
>>> df.at[4, 'B']
10

Get value within a Series

>>> df.loc[5].at['B']
4
at_time(time, asof=False, axis=None)

Select values at particular time of day (e.g. 9:30AM).

time : datetime.time or string axis : {0 or ‘index’, 1 or ‘columns’}, default 0

New in version 0.24.0.

values_at_time : same type as caller

TypeError
If the index is not a DatetimeIndex

between_time : Select values between particular times of the day. first : Select initial periods of time series based on a date offset. last : Select final periods of time series based on a date offset. DatetimeIndex.indexer_at_time : Get just the index locations for

values at particular time of the day.
>>> i = pd.date_range('2018-04-09', periods=4, freq='12H')
>>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-09 12:00:00  2
2018-04-10 00:00:00  3
2018-04-10 12:00:00  4
>>> ts.at_time('12:00')
                     A
2018-04-09 12:00:00  2
2018-04-10 12:00:00  4
axes

Return a list representing the axes of the DataFrame.

It has the row axis labels and column axis labels as the only members. They are returned in that order.

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['coll', 'col2'],
dtype='object')]
between_time(start_time, end_time, include_start=True, include_end=True, axis=None)

Select values between particular times of the day (e.g., 9:00-9:30 AM).

By setting start_time to be later than end_time, you can get the times that are not between the two times.

start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True axis : {0 or ‘index’, 1 or ‘columns’}, default 0

New in version 0.24.0.

values_between_time : same type as caller

TypeError
If the index is not a DatetimeIndex

at_time : Select values at a particular time of the day. first : Select initial periods of time series based on a date offset. last : Select final periods of time series based on a date offset. DatetimeIndex.indexer_between_time : Get just the index locations for

values between particular times of the day.
>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4
>>> ts.between_time('0:15', '0:45')
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are not between two times by setting start_time later than end_time:

>>> ts.between_time('0:45', '0:15')
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4
bfill(axis=None, inplace=False, limit=None, downcast=None)

Synonym for DataFrame.fillna() with method='bfill'.

blocks

Internal property, property synonym for as_blocks().

Deprecated since version 0.21.0.

bool()

Return the bool of a single element PandasObject.

This must be a boolean scalar value, either True or False. Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean

boundary()

Constructs the boundary of the geometry.

bounds

Return a DataFrame of minx, miny, maxx, maxy values of geometry objects

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, **kwds)

Make a box plot from DataFrame columns.

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. The position of the whiskers is set by default to 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box. Outlier points are those past the end of the whiskers.

For further details see Wikipedia’s entry for boxplot.

column : str or list of str, optional
Column name or list of names, or vector. Can be any valid input to pandas.DataFrame.groupby().
by : str or array-like, optional
Column in the DataFrame to pandas.DataFrame.groupby(). One box-plot will be done per value of columns in by.
ax : object of class matplotlib.axes.Axes, optional
The matplotlib axes to be used by boxplot.
fontsize : float or str
Tick label font size in points or as a string (e.g., large).
rot : int or float, default 0
The rotation angle of labels (in degrees) with respect to the screen coordinate system.
grid : boolean, default True
Setting this to True will show the grid.
figsize : A tuple (width, height) in inches
The size of the figure to create in matplotlib.
layout : tuple (rows, columns), optional
For example, (3, 5) will display the subplots using 3 columns and 5 rows, starting from the top-left.
return_type : {‘axes’, ‘dict’, ‘both’} or None, default ‘axes’

The kind of object to return. The default is axes.

  • ‘axes’ returns the matplotlib axes the boxplot is drawn on.

  • ‘dict’ returns a dictionary whose values are the matplotlib Lines of the boxplot.

  • ‘both’ returns a namedtuple with the axes and dict.

  • when grouping with by, a Series mapping columns to return_type is returned.

    If return_type is None, a NumPy array of axes with the same shape as layout is returned.

**kwds
All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().

result :

The return type depends on the return_type parameter:

  • ‘axes’ : object of class matplotlib.axes.Axes
  • ‘dict’ : dict of matplotlib.lines.Line2D objects
  • ‘both’ : a namedtuple with structure (ax, lines)

For data grouped with by:

  • Series
  • array (for return_type = None)

Series.plot.hist: Make a histogram. matplotlib.pyplot.boxplot : Matplotlib equivalent plot.

Use return_type='dict' when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.

Boxplots can be created for every column in the dataframe by df.boxplot() or indicating the columns to be used:

Boxplots of variables distributions grouped by the values of a third variable can be created using the option by. For instance:

A list of strings (i.e. ['X', 'Y']) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:

The layout of boxplot can be adjusted giving a tuple to layout:

Additional formatting can be done to the boxplot, like suppressing the grid (grid=False), rotating the labels in the x-axis (i.e. rot=45) or changing the fontsize (i.e. fontsize=15):

The parameter return_type can be used to select the type of element returned by boxplot. When return_type='axes' is selected, the matplotlib axes on which the boxplot is drawn are returned:

>>> boxplot = df.boxplot(column=['Col1','Col2'], return_type='axes')
>>> type(boxplot)
<class 'matplotlib.axes._subplots.AxesSubplot'>

When grouping with by, a Series mapping columns to return_type is returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type='axes')
>>> type(boxplot)
<class 'pandas.core.series.Series'>

If return_type is None, a NumPy array of axes with the same shape as layout is returned:

>>> boxplot =  df.boxplot(column=['Col1', 'Col2'], by='X',
...                       return_type=None)
>>> type(boxplot)
<class 'numpy.ndarray'>
buffer(distance)

Constructs a polygon at a specified distance from the geometry.

Parameters:
distance:
  • length in current projection. Only polygon accept

negative values.

centroid

The true centroid if it is within or on the feature; otherwise, the label point is returned. Returns a point object.

clip(envelope)

Constructs the intersection of the geometry and the specified extent.

Parameters:
envelope:
  • arcpy.Extent object
clip_lower(threshold, axis=None, inplace=False)

Trim values below a given threshold.

Deprecated since version 0.24.0: Use clip(lower=threshold) instead.

Elements below the threshold will be changed to match the threshold value(s). Threshold can be a single value or an array, in the latter case it performs the truncation element-wise.

threshold : numeric or array-like

Minimum value allowed. All values below threshold will be set to this value.

  • float : every value is compared to threshold.
  • array-like : The shape of threshold should match the object it’s compared to. When self is a Series, threshold should be the length. When self is a DataFrame, threshold should 2-D and the same shape as self for axis=None, or 1-D and the same length as the axis being compared.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Align self with threshold along the given axis.
inplace : boolean, default False

Whether to perform the operation in place on the data.

New in version 0.21.0.

Series or DataFrame
Original data with values trimmed.
Series.clip : General purpose method to trim Series values to given
threshold(s).
DataFrame.clip : General purpose method to trim DataFrame values to
given threshold(s).

Series single threshold clipping:

>>> s = pd.Series([5, 6, 7, 8, 9])
>>> s.clip(lower=8)
0    8
1    8
2    8
3    8
4    9
dtype: int64

Series clipping element-wise using an array of thresholds. threshold should be the same length as the Series.

>>> elemwise_thresholds = [4, 8, 7, 2, 5]
>>> s.clip(lower=elemwise_thresholds)
0    5
1    8
2    7
3    8
4    9
dtype: int64

DataFrames can be compared to a scalar.

>>> df = pd.DataFrame({"A": [1, 3, 5], "B": [2, 4, 6]})
>>> df
   A  B
0  1  2
1  3  4
2  5  6
>>> df.clip(lower=3)
   A  B
0  3  3
1  3  4
2  5  6

Or to an array of values. By default, threshold should be the same shape as the DataFrame.

>>> df.clip(lower=np.array([[3, 4], [2, 2], [6, 2]]))
   A  B
0  3  4
1  3  4
2  6  6

Control how threshold is broadcast with axis. In this case threshold should be the same length as the axis specified by axis.

>>> df.clip(lower=[3, 3, 5], axis='index')
   A  B
0  3  3
1  3  4
2  5  6
>>> df.clip(lower=[4, 5], axis='columns')
   A  B
0  4  5
1  4  5
2  5  6
clip_upper(threshold, axis=None, inplace=False)

Trim values above a given threshold.

Deprecated since version 0.24.0: Use clip(upper=threshold) instead.

Elements above the threshold will be changed to match the threshold value(s). Threshold can be a single value or an array, in the latter case it performs the truncation element-wise.

threshold : numeric or array-like

Maximum value allowed. All values above threshold will be set to this value.

  • float : every value is compared to threshold.
  • array-like : The shape of threshold should match the object it’s compared to. When self is a Series, threshold should be the length. When self is a DataFrame, threshold should 2-D and the same shape as self for axis=None, or 1-D and the same length as the axis being compared.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Align object with threshold along the given axis.
inplace : boolean, default False

Whether to perform the operation in place on the data.

New in version 0.21.0.

Series or DataFrame
Original data with values trimmed.
Series.clip : General purpose method to trim Series values to given
threshold(s).
DataFrame.clip : General purpose method to trim DataFrame values to
given threshold(s).
>>> s = pd.Series([1, 2, 3, 4, 5])
>>> s
0    1
1    2
2    3
3    4
4    5
dtype: int64
>>> s.clip(upper=3)
0    1
1    2
2    3
3    3
4    3
dtype: int64
>>> elemwise_thresholds = [5, 4, 3, 2, 1]
>>> elemwise_thresholds
[5, 4, 3, 2, 1]
>>> s.clip(upper=elemwise_thresholds)
0    1
1    2
2    3
3    2
4    1
dtype: int64
columns

The column labels of the DataFrame.

combine(other, func, fill_value=None, overwrite=True)

Perform column-wise combine with another DataFrame based on a passed function.

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

other : DataFrame
The DataFrame to merge column-wise.
func : function
Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns.
fill_value : scalar value, default None
The value to fill NaNs with prior to passing any column to the merge func.
overwrite : boolean, default True
If True, columns in self that do not exist in other will be overwritten with NaNs.

result : DataFrame

DataFrame.combine_first : Combine two DataFrame objects and default to
non-null values in frame calling the method.

Combine using a simple function that chooses the smaller column.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Example using a true element-wise combine function.

>>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3

Using fill_value fills Nones prior to passing the column to the merge function.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None is preserved

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0  NaN
1  0  3.0

Example that demonstrates the use of overwrite and behavior when the axis differ between the dataframes.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2])
>>> df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0
>>> df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN
>>> df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0
combine_first(other)

Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.

other : DataFrame
Provided DataFrame to use to fill null values.

combined : DataFrame

DataFrame.combine : Perform series-wise operation on two DataFrames
using a given function.
>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

Null values still persist if the location of that null value does not exist in other

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0
compound(axis=None, skipna=None, level=None)

Return the compound percentage of the values for the requested axis.

axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

compounded : Series or DataFrame (if level specified)

contains(second_geometry, relation=None)

Indicates if the base geometry contains the comparison geometry.

Paramters:
second_geometry:
 
  • a second geometry
convert_objects(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)

Attempt to infer better dtype for object columns.

Deprecated since version 0.21.0.

convert_dates : boolean, default True
If True, convert to date where possible. If ‘coerce’, force conversion, with unconvertible values becoming NaT.
convert_numeric : boolean, default False
If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.
convert_timedeltas : boolean, default True
If True, convert to timedelta where possible. If ‘coerce’, force conversion, with unconvertible values becoming NaT.
copy : boolean, default True
If True, return a copy even if no copy is necessary (e.g. no conversion was done). Note: This is meant for internal use, and should not be confused with inplace.

converted : same as input object

to_datetime : Convert argument to datetime. to_timedelta : Convert argument to timedelta. to_numeric : Convert argument to numeric type.

convex_hull()

Constructs the geometry that is the minimal bounding polygon such that all outer angles are convex.

coordinates()

returns the point coordinates of the geometry as a np.array object

copy(deep=True)

Make a copy of this SpatialDataFrame object Parameters:

Deep:boolean, default True Make a deep copy, i.e. also copy data
Returns:
copy:of SpatialDataFrame
corr(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values.

method : {‘pearson’, ‘kendall’, ‘spearman’} or callable
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
  • callable: callable with input two 1d ndarrays
    and returning a float .. versionadded:: 0.24.0
min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

y : DataFrame

DataFrame.corrwith Series.corr

>>> histogram_intersection = lambda a, b: np.minimum(a, b
... ).sum().round(decimals=1)
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)
      dogs cats
dogs   1.0  0.3
cats   0.3  1.0
corrwith(other, axis=0, drop=False, method='pearson')

Compute pairwise correlation between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

other : DataFrame, Series axis : {0 or ‘index’, 1 or ‘columns’}, default 0

0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise
drop : boolean, default False
Drop missing indices from result
method : {‘pearson’, ‘kendall’, ‘spearman’} or callable
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
  • callable: callable with input two 1d ndarrays
    and returning a float

New in version 0.24.0.

correls : Series

DataFrame.corr

count(axis=0, level=None, numeric_only=False)

Count non-NA cells for each column or row.

The values None, NaN, NaT, and optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
level : int or str, optional
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name.
numeric_only : boolean, default False
Include only float, int or boolean data.
Series or DataFrame
For each column/row the number of non-NA/null entries. If level is specified returns a DataFrame.

Series.count: Number of non-NA elements in a Series. DataFrame.shape: Number of DataFrame rows and columns (including NA

elements).
DataFrame.isna: Boolean same-sized DataFrame showing places of NA
elements.

Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2   Lewis  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

Counts for each row:

>>> df.count(axis='columns')
0    3
1    2
2    3
3    3
4    3
dtype: int64

Counts for one level of a MultiIndex:

>>> df.set_index(["Person", "Single"]).count(level="Person")
        Age
Person
John      2
Lewis     1
Myla      1
cov(min_periods=None)

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result.
DataFrame
The covariance matrix of the series of the DataFrame.

pandas.Series.cov : Compute covariance with another Series. pandas.core.window.EWM.cov: Exponential weighted sample covariance. pandas.core.window.Expanding.cov : Expanding sample covariance. pandas.core.window.Rolling.cov : Rolling sample covariance.

Returns the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1.

For DataFrames that have Series that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

>>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667
>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(1000, 5),
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(20, 3),
...                   columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202
crosses(second_geometry)

Indicates if the two geometries intersect in a geometry of a lesser shape type.

Paramters:
second_geometry:
 
  • a second geometry
cummax(axis=None, skipna=True, *args, **kwargs)

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs :
Additional keywords have no effect but might be accepted for compatibility with NumPy.

cummax : Series or DataFrame

core.window.Expanding.max : Similar functionality
but ignores NaN values.
DataFrame.max : Return the maximum over
DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row, use axis=1

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0
cummin(axis=None, skipna=True, *args, **kwargs)

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs :
Additional keywords have no effect but might be accepted for compatibility with NumPy.

cummin : Series or DataFrame

core.window.Expanding.min : Similar functionality
but ignores NaN values.
DataFrame.min : Return the minimum over
DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row, use axis=1

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0
cumprod(axis=None, skipna=True, *args, **kwargs)

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs :
Additional keywords have no effect but might be accepted for compatibility with NumPy.

cumprod : Series or DataFrame

core.window.Expanding.prod : Similar functionality
but ignores NaN values.
DataFrame.prod : Return the product over
DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumprod(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row, use axis=1

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0
cumsum(axis=None, skipna=True, *args, **kwargs)

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs :
Additional keywords have no effect but might be accepted for compatibility with NumPy.

cumsum : Series or DataFrame

core.window.Expanding.sum : Similar functionality
but ignores NaN values.
DataFrame.sum : Return the sum over
DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row, use axis=1

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0
cut(cutter)

Splits this geometry into a part left of the cutting polyline, and a part right of it.

Parameters:
cutter:
  • The cutting polyline geometry.
densify(method, distance, deviation)

Creates a new geometry with added vertices

Parameters:
method:
  • The type of densification, DISTANCE, ANGLE, or GEODESIC
distance:
  • The maximum distance between vertices. The actual

distance between vertices will usually be less than the maximum distance as new vertices will be evenly distributed along the original segment. If using a type of DISTANCE or ANGLE, the distance is measured in the units of the geometry’s spatial reference. If using a type of GEODESIC, the distance is measured in meters.

deviation:
  • Densify uses straight lines to approximate curves.

You use deviation to control the accuracy of this approximation. The deviation is the maximum distance between the new segment and the original curve. The smaller its value, the more segments will be required to approximate the curve.

describe(percentiles=None, include=None, exclude=None)

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

percentiles : list-like of numbers, optional
The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
include : ‘all’, list-like of dtypes or None (default), optional

A white list of data types to include in the result. Ignored for Series. Here are the options:

  • ‘all’ : All columns of the input will be included in the output.
  • A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To select pandas categorical columns, use 'category'
  • None (default) : The result will include all numeric columns.
exclude : list-like of dtypes or None (default), optional,

A black list of data types to omit from the result. Ignored for Series. Here are the options:

  • A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To exclude pandas categorical columns, use 'category'
  • None (default) : The result will exclude nothing.
Series or DataFrame
Summary statistics of the Series or Dataframe provided.

DataFrame.count: Count number of non-NA/null observations. DataFrame.max: Maximum of the values in the object. DataFrame.min: Minimum of the values in the object. DataFrame.mean: Mean of the values. DataFrame.std: Standard deviation of the obersvations. DataFrame.select_dtypes: Subset of a DataFrame including/excluding

columns based on their dtype.

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

Describing a numeric Series.

>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

Describing a categorical Series.

>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count     4
unique    3
top       a
freq      2
dtype: object

Describing a timestamp Series.

>>> s = pd.Series([
...   np.datetime64("2000-01-01"),
...   np.datetime64("2010-01-01"),
...   np.datetime64("2010-01-01")
... ])
>>> s.describe()
count                       3
unique                      2
top       2010-01-01 00:00:00
freq                        2
first     2000-01-01 00:00:00
last      2010-01-01 00:00:00
dtype: object

Describing a DataFrame. By default only numeric fields are returned.

>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
...                    'numeric': [1, 2, 3],
...                    'object': ['a', 'b', 'c']
...                   })
>>> df.describe()
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all')
        categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              f      NaN      c
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN

Describing a column from a DataFrame by accessing it as an attribute.

>>> df.numeric.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number])
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[np.object])
       object
count       3
unique      3
top         c
freq        1

Including only categorical columns from a DataFrame description.

>>> df.describe(include=['category'])
       categorical
count            3
unique           3
top              f
freq             1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number])
       categorical object
count            3      3
unique           3      3
top              f      c
freq             1      1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[np.object])
       categorical  numeric
count            3      3.0
unique           3      NaN
top              f      NaN
freq             1      NaN
mean           NaN      2.0
std            NaN      1.0
min            NaN      1.0
25%            NaN      1.5
50%            NaN      2.0
75%            NaN      2.5
max            NaN      3.0
diff(periods=1, axis=0)

First discrete difference of element.

Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is the element in the same column of the previous row).

periods : int, default 1
Periods to shift for calculating difference, accepts negative values.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Take difference over rows (0) or columns (1).

New in version 0.16.1..

diffed : DataFrame

Series.diff: First discrete difference for a Series. DataFrame.pct_change: Percent change over given number of periods. DataFrame.shift: Shift index by desired number of periods with an

optional time freq.

Difference with previous row

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]})
>>> df
   a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36
>>> df.diff()
     a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

Difference with previous column

>>> df.diff(axis=1)
    a    b     c
0 NaN  0.0   0.0
1 NaN -1.0   3.0
2 NaN -1.0   7.0
3 NaN -1.0  13.0
4 NaN  0.0  20.0
5 NaN  2.0  28.0

Difference with 3rd previous row

>>> df.diff(periods=3)
     a    b     c
0  NaN  NaN   NaN
1  NaN  NaN   NaN
2  NaN  NaN   NaN
3  3.0  2.0  15.0
4  3.0  4.0  21.0
5  3.0  6.0  27.0

Difference with following row

>>> df.diff(periods=-1)
     a    b     c
0 -1.0  0.0  -3.0
1 -1.0 -1.0  -5.0
2 -1.0 -1.0  -7.0
3 -1.0 -2.0  -9.0
4 -1.0 -3.0 -11.0
5  NaN  NaN   NaN
difference(second_geometry)

Constructs the geometry that is composed only of the region unique to the base geometry but not part of the other geometry. The following illustration shows the results when the red polygon is the source geometry.

Paramters:
second_geometry:
 
  • a second geometry
disjoint(second_geometry)

Indicates if the base and comparison geometries share no points in common.

Paramters:
second_geometry:
 
  • a second geometry
distance_to(second_geometry)

Returns the minimum distance between two geometries. If the geometries intersect, the minimum distance is 0. Both geometries must have the same projection.

Paramters:
second_geometry:
 
  • a second geometry
div(other, axis='columns', level=None, fill_value=None)

Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
DataFrame
Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
divide(other, axis='columns', level=None, fill_value=None)

Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
DataFrame
Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
dot(other)

Compute the matrix mutiplication between the DataFrame and other.

This method computes the matrix product between the DataFrame and the values of an other Series, DataFrame or a numpy array.

It can also be called using self @ other in Python >= 3.5.

other : Series, DataFrame or array-like
The other object to compute the matrix product with.
Series or DataFrame
If other is a Series, return the matrix product between self and other as a Serie. If other is a DataFrame or a numpy.array, return the matrix product of self and other in a DataFrame of a np.array.

Series.dot: Similar method for Series.

The dimensions of DataFrame and other must be compatible in order to compute the matrix multiplication.

The dot method for Series computes the inner product, instead of the matrix product here.

Here we multiply a DataFrame with a Series.

>>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> s = pd.Series([1, 1, 2, 1])
>>> df.dot(s)
0    -4
1     5
dtype: int64

Here we multiply a DataFrame with another DataFrame.

>>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(other)
    0   1
0   1   4
1   2   2

Note that the dot method give the same result as @

>>> df @ other
    0   1
0   1   4
1   2   2

The dot method works also if other is an np.array.

>>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(arr)
    0   1
0   1   4
1   2   2
drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.

labels : single label or list-like
Index or column labels to drop.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
index, columns : single label or list-like

Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

New in version 0.21.0.

level : int or level name, optional
For MultiIndex, level from which the labels will be removed.
inplace : bool, default False
If True, do operation inplace and return None.
errors : {‘ignore’, ‘raise’}, default ‘raise’
If ‘ignore’, suppress error and only existing labels are dropped.

dropped : pandas.DataFrame

KeyError
If none of the labels are found in the selected axis

DataFrame.loc : Label-location based indexer for selection by label. DataFrame.dropna : Return DataFrame with labels on given axis omitted

where (all or any) data are missing.
DataFrame.drop_duplicates : Return DataFrame with duplicate rows
removed, optionally only considering certain columns.

Series.drop : Return Series with specified index labels removed.

>>> df = pd.DataFrame(np.arange(12).reshape(3,4),
...                   columns=['A', 'B', 'C', 'D'])
>>> df
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Drop columns

>>> df.drop(['B', 'C'], axis=1)
   A   D
0  0   3
1  4   7
2  8  11
>>> df.drop(columns=['B', 'C'])
   A   D
0  0   3
1  4   7
2  8  11

Drop a row by index

>>> df.drop([0, 1])
   A  B   C   D
2  8  9  10  11

Drop columns and/or rows of MultiIndex DataFrame

>>> midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
...                   data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                         [250, 150], [1.5, 0.8], [320, 250],
...                         [1, 0.8], [0.3,0.2]])
>>> df
                big     small
lama    speed   45.0    30.0
        weight  200.0   100.0
        length  1.5     1.0
cow     speed   30.0    20.0
        weight  250.0   150.0
        length  1.5     0.8
falcon  speed   320.0   250.0
        weight  1.0     0.8
        length  0.3     0.2
>>> df.drop(index='cow', columns='small')
                big
lama    speed   45.0
        weight  200.0
        length  1.5
falcon  speed   320.0
        weight  1.0
        length  0.3
>>> df.drop(index='length', level=1)
                big     small
lama    speed   45.0    30.0
        weight  200.0   100.0
cow     speed   30.0    20.0
        weight  250.0   150.0
falcon  speed   320.0   250.0
        weight  1.0     0.8
drop_duplicates(subset=None, keep='first', inplace=False)

Return DataFrame with duplicate rows removed, optionally only considering certain columns.

subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
keep : {‘first’, ‘last’, False}, default ‘first’
  • first : Drop duplicates except for the first occurrence.
  • last : Drop duplicates except for the last occurrence.
  • False : Drop all duplicates.
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

deduplicated : DataFrame

droplevel(level, axis=0)

Return DataFrame with requested index / column level(s) removed.

New in version 0.24.0.

level : int, str, or list-like
If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

DataFrame.droplevel()

>>> df = pd.DataFrame([
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
>>> df.columns = pd.MultiIndex.from_tuples([
...    ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
>>> df
level_1   c   d
level_2   e   f
a b
1 2      3   4
5 6      7   8
9 10    11  12
>>> df.droplevel('a')
level_1   c   d
level_2   e   f
b
2        3   4
6        7   8
10      11  12
>>> df.droplevel('level2', axis=1)
level_1   c   d
a b
1 2      3   4
5 6      7   8
9 10    11  12
dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

  • 0, or ‘index’ : Drop rows which contain missing values.
  • 1, or ‘columns’ : Drop columns which contain missing value.

Deprecated since version 0.23.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

how : {‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

  • ‘any’ : If any NA values are present, drop that row or column.
  • ‘all’ : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values.
subset : array-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace : bool, default False
If True, do operation inplace and return None.
DataFrame
DataFrame with NA entries dropped from it.

DataFrame.isna: Indicate missing values. DataFrame.notna : Indicate existing (non-missing) values. DataFrame.fillna : Replace missing values. Series.dropna : Drop missing values. Index.dropna : Drop missing indices.

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'born'])
       name        toy       born
1    Batman  Batmobile 1940-04-25

Keep the DataFrame with valid entries in the same variable.

>>> df.dropna(inplace=True)
>>> df
     name        toy       born
1  Batman  Batmobile 1940-04-25
dtypes

Return the dtypes in the DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more.

pandas.Series
The data type of each column.

pandas.DataFrame.ftypes : Dtype and sparsity information.

>>> df = pd.DataFrame({'float': [1.0],
...                    'int': [1],
...                    'datetime': [pd.Timestamp('20180310')],
...                    'string': ['foo']})
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object
duplicated(subset=None, keep='first')

Return boolean Series denoting duplicate rows, optionally only considering certain columns.

subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
keep : {‘first’, ‘last’, False}, default ‘first’
  • first : Mark duplicates as True except for the first occurrence.
  • last : Mark duplicates as True except for the last occurrence.
  • False : Mark all duplicates as True.

duplicated : Series

empty

Indicator whether DataFrame is empty.

True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.

bool
If DataFrame is empty, return True, if not return False.

pandas.Series.dropna pandas.DataFrame.dropna

If DataFrame contains only NaNs, it is still not considered empty. See the example below.

An example of an actual empty DataFrame. Notice the index is empty:

>>> df_empty = pd.DataFrame({'A' : []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:

>>> df = pd.DataFrame({'A' : [np.nan]})
>>> df
    A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True
eq(other, axis='columns', level=None)

Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
DataFrame of bool
Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
equals(second_geometry)

Indicates if the base and comparison geometries are of the same shape type and define the same set of points in the plane. This is a 2D comparison only; M and Z values are ignored. Paramters:

second_geometry:
 
  • a second geometry
erase(other, inplace=False)

Erases

Argument Description
other Required Geometry. A geometry object to erase from other geometries.
inplace Optional boolean. Default False. Modify the SpatialDataFrame in place (do not create a new object)
Returns:SpatialDataFrame
eval(expr, inplace=False, **kwargs)

Evaluate a string describing operations on DataFrame columns.

Operates on columns only, not specific rows or elements. This allows eval to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.

expr : str
The expression string to evaluate.
inplace : bool, default False

If the expression contains an assignment, whether to perform the operation inplace and mutate the existing DataFrame. Otherwise, a new DataFrame is returned.

New in version 0.18.0..

kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().
ndarray, scalar, or pandas object
The result of the evaluation.
DataFrame.query : Evaluates a boolean expression to query the columns
of a frame.
DataFrame.assign : Can evaluate an expression or function to create new
values for a column.
pandas.eval : Evaluate a Python expression as a string using various
backends.

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
>>> df.eval('A + B')
0    11
1    10
2     9
3     8
4     7
dtype: int64

Assignment is allowed though by default the original DataFrame is not modified.

>>> df.eval('C = A + B')
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2

Use inplace=True to modify the original DataFrame.

>>> df.eval('C = A + B', inplace=True)
>>> df
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)

Provides exponential weighted functions.

New in version 0.18.0.

com : float, optional
Specify decay in terms of center of mass,
span : float, optional
Specify decay in terms of span,
halflife : float, optional
Specify decay in terms of half-life,
alpha : float, optional

Specify smoothing factor directly,

New in version 0.18.0.

min_periods : int, default 0
Minimum number of observations in window required to have a value (otherwise result is NA).
adjust : bool, default True
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average)
ignore_na : bool, default False
Ignore missing values when calculating weights; specify True to reproduce pre-0.15.0 behavior

a Window sub-classed for the particular operation

rolling : Provides rolling window calculations. expanding : Provides expanding transformations.

Exactly one of center of mass, span, half-life, and alpha must be provided. Allowed values and relationship between the parameters are specified in the parameter descriptions above; see the link at the end of this section for a detailed explanation.

When adjust is True (default), weighted averages are calculated using weights (1-alpha)**(n-1), (1-alpha)**(n-2), …, 1-alpha, 1.

When adjust is False, weighted averages are calculated recursively as:
weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)*weighted_average[i-1] + alpha*arg[i].

When ignore_na is False (default), weights are based on absolute positions. For example, the weights of x and y used in calculating the final weighted average of [x, None, y] are (1-alpha)**2 and 1 (if adjust is True), and (1-alpha)**2 and alpha (if adjust is False).

When ignore_na is True (reproducing pre-0.15.0 behavior), weights are based on relative positions. For example, the weights of x and y used in calculating the final weighted average of [x, None, y] are 1-alpha and 1 (if adjust is True), and 1-alpha and alpha (if adjust is False).

More details can be found at http://pandas.pydata.org/pandas-docs/stable/computation.html#exponentially-weighted-windows

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
>>> df.ewm(com=0.5).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
expanding(min_periods=1, center=False, axis=0)

Provides expanding transformations.

New in version 0.18.0.

min_periods : int, default 1
Minimum number of observations in window required to have a value (otherwise result is NA).
center : bool, default False
Set the labels at the center of the window.

axis : int or str, default 0

a Window sub-classed for the particular operation

rolling : Provides rolling window calculations. ewm : Provides exponential weighted functions.

By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
>>> df.expanding(2).sum()
     B
0  NaN
1  1.0
2  3.0
3  3.0
4  7.0
extent

the extent of the geometry

ffill(axis=None, inplace=False, limit=None, downcast=None)

Synonym for DataFrame.fillna() with method='ffill'.

fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

Fill NA/NaN values using the specified method.

value : scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled). This value cannot be a list.
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap

axis : {0 or ‘index’, 1 or ‘columns’} inplace : boolean, default False

If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

filled : DataFrame

interpolate : Fill NaN values using interpolation. reindex, asfreq

>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
...                    [3, 4, np.nan, 1],
...                    [np.nan, np.nan, np.nan, 5],
...                    [np.nan, 3, np.nan, 4]],
...                    columns=list('ABCD'))
>>> df
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

Replace all NaN elements with 0s.

>>> df.fillna(0)
    A   B   C   D
0   0.0 2.0 0.0 0
1   3.0 4.0 0.0 1
2   0.0 0.0 0.0 5
3   0.0 3.0 0.0 4

We can also propagate non-null values forward or backward.

>>> df.fillna(method='ffill')
    A   B   C   D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   3.0 4.0 NaN 5
3   3.0 3.0 NaN 4

Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.

>>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
>>> df.fillna(value=values)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 2.0 1
2   0.0 1.0 2.0 5
3   0.0 3.0 2.0 4

Only replace the first NaN element.

>>> df.fillna(value=values, limit=1)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 NaN 1
2   NaN 1.0 NaN 5
3   NaN 3.0 NaN 4
filter(items=None, like=None, regex=None, axis=None)

Subset rows or columns of dataframe according to labels in the specified index.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

items : list-like
List of axis to restrict to (must not all be present).
like : string
Keep axis where “arg in col == True”.
regex : string (regular expression)
Keep axis with re.search(regex, col) == True.
axis : int or string axis name
The axis to filter on. By default this is the info axis, ‘index’ for Series, ‘columns’ for DataFrame.

same type as input object

DataFrame.loc

The items, like, and regex parameters are enforced to be mutually exclusive.

axis defaults to the info axis that is used when indexing with [].

>>> df = pd.DataFrame(np.array(([1,2,3], [4,5,6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> # select columns by name
>>> df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
         one  three
mouse     1      3
rabbit    4      6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6
first(offset)

Convenience method for subsetting initial periods of time series data based on a date offset.

offset : string, DateOffset, dateutil.relativedelta

subset : same type as caller

TypeError
If the index is not a DatetimeIndex

last : Select final periods of time series based on a date offset. at_time : Select values at a particular time of the day. between_time : Select values between particular times of the day.

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the first 3 days:

>>> ts.first('3D')
            A
2018-04-09  1
2018-04-11  2

Notice the data for 3 first calender days were returned, not the first 3 days observed in the dataset, and therefore data for 2018-04-13 was not returned.

first_point

The first coordinate point of the geometry.

first_valid_index()

Return index for first non-NA/null value.

scalar : type of index

If all elements are non-NA/null, returns None. Also returns None for empty NDFrame.

floordiv(other, axis='columns', level=None, fill_value=None)

Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
DataFrame
Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
from_csv(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=None, infer_datetime_format=False)

Read CSV file.

Deprecated since version 0.21.0: Use pandas.read_csv() instead.

It is preferable to use the more powerful pandas.read_csv() for most general purposes, but from_csv makes for an easy roundtrip to and from a file (the exact counterpart of to_csv), especially with a DataFrame of time series data.

This method only differs from the preferred pandas.read_csv() in some defaults:

  • index_col is 0 instead of None (take first column as index by default)
  • parse_dates is True instead of False (try parsing the index as datetime by default)

So a pd.DataFrame.from_csv(path) can be replaced by pd.read_csv(path, index_col=0, parse_dates=True).

path : string file path or file handle / StringIO header : int, default 0

Row to use as header (skip prior rows)
sep : string, default ‘,’
Field delimiter
index_col : int or sequence, default 0
Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
parse_dates : boolean, default True
Parse dates. Different default from read_table
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
infer_datetime_format : boolean, default False
If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.

y : DataFrame

pandas.read_csv

static from_df(df, address_column='address', geocoder=None)

Returns a SpatialDataFrame from a dataframe with an address column.

Argument Description
df Required Pandas DataFrame. Source dataset
address_column Optional String. The default is “address”. This is the name of a column in the specified dataframe that contains addresses (as strings). The addresses are batch geocoded using the GIS’s first configured geocoder and their locations used as the geometry of the spatial dataframe. Ignored if the ‘geometry’ parameter is also specified.
geocoder Optional Geocoder. The geocoder to be used. If not specified, the active GIS’s first geocoder is used.
Returns:SpatialDataFrame

NOTE: Credits will be consumed for batch_geocoding, from the GIS to which the geocoder belongs.

from_dict(data, orient='columns', dtype=None, columns=None)

Construct DataFrame from dict of array-like or dicts.

Creates DataFrame object from dictionary by columns or by index allowing dtype specification.

data : dict
Of the form {field : array-like} or {field : dict}.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
dtype : dtype, default None
Data type to force, otherwise infer.
columns : list, default None

Column labels to use when orient='index'. Raises a ValueError if used with orient='columns'.

New in version 0.23.0.

pandas.DataFrame

DataFrame.from_records : DataFrame from ndarray (structured
dtype), list of tuples, dict, or DataFrame.

DataFrame : DataFrame object creation using constructor.

By default the keys of the dict become the DataFrame columns:

>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Specify orient='index' to create the DataFrame using dictionary keys as rows:

>>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data, orient='index')
       0  1  2  3
row_1  3  2  1  0
row_2  a  b  c  d

When using the ‘index’ orientation, the column names can be specified manually:

>>> pd.DataFrame.from_dict(data, orient='index',
...                        columns=['A', 'B', 'C', 'D'])
       A  B  C  D
row_1  3  2  1  0
row_2  a  b  c  d
static from_featureclass(filename, **kwargs)

Returns a SpatialDataFrame from a feature class.

Argument Description
filename Required string. The full path to the feature class
sql_clause Optional string. The sql clause to parse data down
where_clause Optional string. A where statement
sr Optional SpatialReference. A spatial reference object
Returns:SpatialDataFrame
static from_hdf(path_or_buf, key=None, **kwargs)

read from the store, close it if we opened it

Retrieve pandas object stored in file, optionally based on where criteria

path_or_buf : path (string), buffer, or path object (pathlib.Path or

py._path.local.LocalPath) to read from

New in version 0.19.0: support for pathlib, py.path.

key : group identifier in the store. Can be omitted if the HDF file
contains a single pandas object.

where : list of Term (or convertable) objects, optional start : optional, integer (defaults to None), row number to start

selection
stop : optional, integer (defaults to None), row number to stop
selection
columns : optional, a list of columns that if not None, will limit the
return columns

iterator : optional, boolean, return an iterator, default False chunksize : optional, nrows to include in iteration, return an iterator

The selected object

from_items(items, columns=None, orient='columns')

Construct a DataFrame from a list of tuples.

Deprecated since version 0.23.0: from_items is deprecated and will be removed in a future version. Use DataFrame.from_dict(dict(items)) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.

Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.

items : sequence of (key, value) pairs
Values should be arrays or Series.
columns : sequence of column labels, optional
Must be passed if orient=’index’.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

frame : DataFrame

static from_layer(layer, **kwargs)

Returns a SpatialDataFrame/Pandas’ Dataframe from a FeatureLayer or Table object.

Arguments Description
layer required FeatureLayer/Table. This is the service endpoint object.
Returns:SpatialDataFrame for feature layers with geometry and Panda’s Dataframe for tables
from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame.

data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use
exclude : sequence, default None
Columns or fields to exclude
columns : sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
coerce_float : boolean, default False
Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
nrows : int, default None
Number of rows to read if data is an iterator

df : DataFrame

static from_xy(df, x_column, y_column, sr=4326)

Converts a Pandas DataFrame into a Spatial DataFrame by providing the X/Y columns.

Argument Description
df Required Pandas DataFrame. Source dataset
x_column Required string. The name of the X-coordinate series
y_column Required string. The name of the Y-coordinate series
sr Optional int. The wkid number of the spatial reference.
Returns:SpatialDataFrame
ftypes

Return the ftypes (indication of sparse/dense and dtype) in DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more.

pandas.Series
The data type and indication of sparse/dense of each column.

pandas.DataFrame.dtypes: Series with just dtype information. pandas.SparseDataFrame : Container for sparse tabular data.

Sparse data should have the same dtypes as its dense representation.

>>> arr = np.random.RandomState(0).randn(100, 4)
>>> arr[arr < .8] = np.nan
>>> pd.DataFrame(arr).ftypes
0    float64:dense
1    float64:dense
2    float64:dense
3    float64:dense
dtype: object
>>> pd.SparseDataFrame(arr).ftypes
0    float64:sparse
1    float64:sparse
2    float64:sparse
3    float64:sparse
dtype: object
ge(other, axis='columns', level=None)

Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
DataFrame of bool
Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
generalize(max_offset)

Creates a new simplified geometry using a specified maximum offset tolerance.

Parameters:
max_offset:
  • The maximum offset tolerance.
geoextent

returns the extent of the spatial dataframe

geometry

Get/Set the geometry data for SpatialDataFrame

geometry_type

The geometry type: polygon, polyline, point, multipoint, multipatch, dimension, or annotation

get(key, default=None)

Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found.

key : object

value : same type as items contained in object

get_area(method, units=None)

Returns the area of the feature using a measurement type.

Parameters:
method:
  • PLANAR measurements reflect the projection of

geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

units:
  • Areal unit of measure keywords: ACRES | ARES | HECTARES
SQUARECENTIMETERS | SQUAREDECIMETERS | SQUAREINCHES | SQUAREFEET
SQUAREKILOMETERS | SQUAREMETERS | SQUAREMILES |

SQUAREMILLIMETERS | SQUAREYARDS

get_dtype_counts()

Return counts of unique dtypes in this object.

dtype : Series
Series with the count of columns with each dtype.

dtypes : Return the dtypes in this object.

>>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
>>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
>>> df
  str  int  float
0   a    1    1.0
1   b    2    2.0
2   c    3    3.0
>>> df.get_dtype_counts()
float64    1
int64      1
object     1
dtype: int64
get_ftype_counts()

Return counts of unique ftypes in this object.

Deprecated since version 0.23.0.

This is useful for SparseDataFrame or for DataFrames containing sparse arrays.

dtype : Series
Series with the count of columns with each type and sparsity (dense/sparse)
ftypes : Return ftypes (indication of sparse/dense and dtype) in
this object.
>>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
>>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
>>> df
  str  int  float
0   a    1    1.0
1   b    2    2.0
2   c    3    3.0
>>> df.get_ftype_counts()  
float64:dense    1
int64:dense      1
object:dense     1
dtype: int64
get_length(method, units)

Returns the length of the feature using a measurement type.

Parameters:
method:
  • PLANAR measurements reflect the projection of

geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

units:
  • Linear unit of measure keywords: CENTIMETERS |

DECIMETERS | FEET | INCHES | KILOMETERS | METERS | MILES | MILLIMETERS | NAUTICALMILES | YARDS

get_part(index=None)

Returns an array of point objects for a particular part of geometry or an array containing a number of arrays, one for each part.

Parameters:
index:
  • The index position of the geometry.
get_value(index, col, takeable=False)

Quickly retrieve single value at passed column and index.

Deprecated since version 0.21.0: Use .at[] or .iat[] accessors instead.

index : row label col : column label takeable : interpret the index/col as indexers, default False

value : scalar value

get_values()

Return an ndarray after converting sparse values to dense.

This is the same as .values for non-sparse data. For sparse data contained in a pandas.SparseArray, the data are first converted to a dense representation.

numpy.ndarray
Numpy representation of DataFrame

values : Numpy representation of DataFrame. pandas.SparseArray : Container for sparse data.

>>> df = pd.DataFrame({'a': [1, 2], 'b': [True, False],
...                    'c': [1.0, 2.0]})
>>> df
   a      b    c
0  1   True  1.0
1  2  False  2.0
>>> df.get_values()
array([[1, True, 1.0], [2, False, 2.0]], dtype=object)
>>> df = pd.DataFrame({"a": pd.SparseArray([1, None, None]),
...                    "c": [1.0, 2.0, 3.0]})
>>> df
     a    c
0  1.0  1.0
1  NaN  2.0
2  NaN  3.0
>>> df.get_values()
array([[ 1.,  1.],
       [nan,  2.],
       [nan,  3.]])
groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)

Group DataFrame or Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

by : mapping, function, label, or list of labels
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted a (single) key.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1).
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
as_index : bool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
sort : bool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
group_keys : bool, default True
When calling apply, add group keys to index to identify pieces.
squeeze : bool, default False
Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
observed : bool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

New in version 0.23.0.

**kwargs
Optional, only accepts keyword argument ‘mutated’ and is passed to groupby.
DataFrameGroupBy or SeriesGroupBy
Depends on the calling object and returns groupby object that contains information about the groups.
resample : Convenience method for frequency conversion and resampling
of time series.

See the user guide for more.

>>> df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon',
...                                'Parrot', 'Parrot'],
...                    'Max Speed' : [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Capitve', 'Wild', 'Capitve', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed' : [390., 350., 30., 20.]},
...                    index=index)
>>> df
                Max Speed
Animal Type
Falcon Capitve      390.0
       Wild         350.0
Parrot Capitve       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level=1).mean()
         Max Speed
Type
Capitve      210.0
Wild         185.0
gt(other, axis='columns', level=None)

Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
DataFrame of bool
Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
head(n=5)

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

n : int, default 5
Number of rows to select.
obj_head : same type as caller
The first n rows of the caller object.

DataFrame.tail: Returns the last n rows.

>>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first 5 lines

>>> df.head()
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon
hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwds)

Make a histogram of the DataFrame’s.

A histogram is a representation of the distribution of data. This function calls matplotlib.pyplot.hist(), on each series in the DataFrame, resulting in one histogram per column.

data : DataFrame
The pandas object holding the data.
column : string or sequence
If passed, will be used to limit data to a subset of columns.
by : object, optional
If passed, then used to form histograms for separate groups.
grid : boolean, default True
Whether to show axis grid lines.
xlabelsize : int, default None
If specified changes the x-axis label size.
xrot : float, default None
Rotation of x axis labels. For example, a value of 90 displays the x labels rotated 90 degrees clockwise.
ylabelsize : int, default None
If specified changes the y-axis label size.
yrot : float, default None
Rotation of y axis labels. For example, a value of 90 displays the y labels rotated 90 degrees clockwise.
ax : Matplotlib axes object, default None
The axes to plot the histogram on.
sharex : boolean, default True if ax is None else False
In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in. Note that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure.
sharey : boolean, default False
In case subplots=True, share y axis and set some y axis labels to invisible.
figsize : tuple
The size in inches of the figure to create. Uses the value in matplotlib.rcParams by default.
layout : tuple, optional
Tuple of (rows, columns) for the layout of the histograms.
bins : integer or sequence, default 10
Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. In this case, bins is returned unmodified.
**kwds
All other plotting keyword arguments to be passed to matplotlib.pyplot.hist().

axes : matplotlib.AxesSubplot or numpy.ndarray of them

matplotlib.pyplot.hist : Plot a histogram using matplotlib.

hull_rectangle

A space-delimited string of the coordinate pairs of the convex hull rectangle.

iat

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

IndexError
When integer position is out of bounds

DataFrame.at : Access a single value for a row/column label pair. DataFrame.loc : Access a group of rows and columns by label(s). DataFrame.iloc : Access a group of rows and columns by integer position(s).

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   columns=['A', 'B', 'C'])
>>> df
    A   B   C
0   0   2   3
1   0   4   1
2  10  20  30

Get value at specified row/column pair

>>> df.iat[1, 2]
1

Set value at specified row/column pair

>>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10

Get value within a series

>>> df.loc[0].iat[1]
2
idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

idxmax : Series

ValueError
  • If the row/column is empty

Series.idxmax

This method is the DataFrame version of ndarray.argmax.

idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

idxmin : Series

ValueError
  • If the row/column is empty

Series.idxmin

This method is the DataFrame version of ndarray.argmin.

iloc

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

  • An integer, e.g. 5.
  • A list or array of integers, e.g. [4, 3, 0].
  • A slice object with ints, e.g. 1:7.
  • A boolean array.
  • A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See more at ref:Selection by Position <indexing.integer>.

DataFrame.iat : Fast integer location scalar accessor. DataFrame.loc : Purely label-location based indexer for selection by label. Series.iloc : Purely integer-location based indexing for

selection by position.
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Indexing just the rows

With a scalar integer.

>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a    1
b    2
c    3
d    4
Name: 0, dtype: int64

With a list of integers.

>>> df.iloc[[0]]
   a  b  c  d
0  1  2  3  4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
>>> df.iloc[[0, 1]]
     a    b    c    d
0    1    2    3    4
1  100  200  300  400

With a slice object.

>>> df.iloc[:3]
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

With a boolean mask the same length as the index.

>>> df.iloc[[True, False, True]]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

With a callable, useful in method chains. The x passed to the lambda is the DataFrame being sliced. This selects the rows whose index label even.

>>> df.iloc[lambda x: x.index % 2 == 0]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

Indexing both axes

You can mix the indexer types for the index and columns. Use : to select the entire axis.

With scalar integers.

>>> df.iloc[0, 1]
2

With lists of integers.

>>> df.iloc[[0, 2], [1, 3]]
      b     d
0     2     4
2  2000  4000

With slice objects.

>>> df.iloc[1:3, 0:3]
      a     b     c
1   100   200   300
2  1000  2000  3000

With a boolean array whose length matches the columns.

>>> df.iloc[:, [True, False, True, False]]
      a     c
0     1     3
1   100   300
2  1000  3000

With a callable function that expects the Series or DataFrame.

>>> df.iloc[:, lambda df: [0, 2]]
      a     c
0     1     3
1   100   300
2  1000  3000
index

The index (row labels) of the DataFrame.

infer_objects()

Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

New in version 0.21.0.

converted : same type as input object

to_datetime : Convert argument to datetime. to_timedelta : Convert argument to timedelta. to_numeric : Convert argument to numeric type.

>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
   A
1  1
2  2
3  3
>>> df.dtypes
A    object
dtype: object
>>> df.infer_objects().dtypes
A    int64
dtype: object
info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

Concise summary of a DataFrame.

verbose : {None, True, False}, optional
Whether to print the full summary. None follows the display.max_info_columns setting. True or False overrides the display.max_info_columns setting.

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed. None follows the display.max_info_columns setting.
memory_usage : boolean/string, default None
Specifies whether total memory usage of the DataFrame elements (including index) should be displayed. None follows the display.memory_usage setting. True or False overrides the display.memory_usage setting. A value of ‘deep’ is equivalent of True, with deep introspection. Memory usage is shown in human-readable units (base-2 representation).
null_counts : boolean, default None

Whether to show the non-null counts

  • If None, then only show if the frame is smaller than max_info_rows and max_info_columns.
  • If True, always show counts.
  • If False, never show counts.
insert(loc, column, value, allow_duplicates=False)

Insert column into DataFrame at specified location.

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

loc : int
Insertion index. Must verify 0 <= loc <= len(columns)
column : string, number, or hashable object
label of the inserted column

value : int, Series, or array-like allow_duplicates : bool, optional

interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)

Interpolate values according to different methods.

Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.

method : str, default ‘linear’

Interpolation technique to use. One of:

  • ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
  • ‘time’: Works on daily and higher resolution data to interpolate given length of interval.
  • ‘index’, ‘values’: use the actual numerical values of the index.
  • ‘pad’: Fill in NaNs using existing values.
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=4). These use the numerical values of the index.
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
  • ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.

New in version 0.18.1: Added support for the ‘akima’ method. Added interpolate method ‘from_derivatives’ which replaces ‘piecewise_polynomial’ in SciPy 0.18; backwards-compatible with SciPy < 0.18

axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to interpolate along.
limit : int, optional
Maximum number of consecutive NaNs to fill. Must be greater than 0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {‘forward’, ‘backward’, ‘both’}, default ‘forward’
If limit is specified, consecutive NaNs will be filled in this direction.
limit_area : {None, ‘inside’, ‘outside’}, default None

If limit is specified, consecutive NaNs will be filled with this restriction.

  • None: No fill restriction.
  • ‘inside’: Only fill NaNs surrounded by valid values (interpolate).
  • ‘outside’: Only fill NaNs outside valid values (extrapolate).

New in version 0.21.0.

downcast : optional, ‘infer’ or None, defaults to None
Downcast dtypes if possible.
**kwargs
Keyword arguments to pass on to the interpolating function.
Series or DataFrame
Returns the same object type as the caller, interpolated at some or all NaN values

fillna : Fill missing values using different methods. scipy.interpolate.Akima1DInterpolator : Piecewise cubic polynomials

(Akima interpolator).
scipy.interpolate.BPoly.from_derivatives : Piecewise polynomial in the
Bernstein basis.

scipy.interpolate.interp1d : Interpolate a 1-D function. scipy.interpolate.KroghInterpolator : Interpolate polynomial (Krogh

interpolator).
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
interpolation.

scipy.interpolate.CubicSpline : Cubic spline data interpolator.

The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index. For more information on their behavior, see the SciPy documentation and SciPy tutorial.

Filling in NaN in a Series via linear interpolation.

>>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64
>>> s.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Filling in NaN in a Series by padding, but filling at most two consecutive NaN at a time.

>>> s = pd.Series([np.nan, "single_one", np.nan,
...                "fill_two_more", np.nan, np.nan, np.nan,
...                4.71, np.nan])
>>> s
0              NaN
1       single_one
2              NaN
3    fill_two_more
4              NaN
5              NaN
6              NaN
7             4.71
8              NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0              NaN
1       single_one
2       single_one
3    fill_two_more
4    fill_two_more
5    fill_two_more
6              NaN
7             4.71
8             4.71
dtype: object

Filling in NaN in a Series via polynomial interpolation or splines: Both ‘polynomial’ and ‘spline’ methods require that you also specify an order (int).

>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Fill the DataFrame forward (that is, going down) along each column using linear interpolation.

Note how the last entry in column ‘a’ is interpolated differently, because there is no entry after it to use for interpolation. Note how the first entry in column ‘b’ remains NaN, because there is no entry befofe it to use for interpolation.

>>> df = pd.DataFrame([(0.0,  np.nan, -1.0, 1.0),
...                    (np.nan, 2.0, np.nan, np.nan),
...                    (2.0, 3.0, np.nan, 9.0),
...                    (np.nan, 4.0, -4.0, 16.0)],
...                   columns=list('abcd'))
>>> df
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  NaN  2.0  NaN   NaN
2  2.0  3.0  NaN   9.0
3  NaN  4.0 -4.0  16.0
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  1.0  2.0 -2.0   5.0
2  2.0  3.0 -3.0   9.0
3  2.0  4.0 -4.0  16.0

Using polynomial interpolation.

>>> df['d'].interpolate(method='polynomial', order=2)
0     1.0
1     4.0
2     9.0
3    16.0
Name: d, dtype: float64
intersect(second_geometry, dimension)

Constructs a geometry that is the geometric intersection of the two input geometries. Different dimension values can be used to create different shape types. The intersection of two geometries of the same shape type is a geometry containing only the regions of overlap between the original geometries.

Paramters:
second_geometry:
 
  • a second geometry
dimension:
  • The topological dimension (shape type) of the
resulting geometry.

1 -A zero-dimensional geometry (point or multipoint). 2 -A one-dimensional geometry (polyline). 4 -A two-dimensional geometry (polygon).

is_copy

Return the copy.

is_empty

Return True for each empty geometry, False for non-empty

is_multipart

True, if the number of parts for the geometry is more than 1

isin(values)

Whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dict
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
DataFrame
DataFrame of booleans showing whether each element in the DataFrame is contained in values.

DataFrame.eq: Equality test for DataFrame. Series.isin: Equivalent method on Series. Series.str.contains: Test if pattern or regex is contained within a

string of a Series or Index.
>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                   index=['falcon', 'dog'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0

When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

>>> df.isin([0, 2])
        num_legs  num_wings
falcon      True       True
dog        False       True

When values is a dict, we can pass values to check for each column separately:

>>> df.isin({'num_wings': [0, 3]})
        num_legs  num_wings
falcon     False      False
dog        False       True

When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in df2.

>>> other = pd.DataFrame({'num_legs': [8, 2],'num_wings': [0, 2]},
...                      index=['spider', 'falcon'])
>>> df.isin(other)
        num_legs  num_wings
falcon      True       True
dog        False      False
isna()

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

DataFrame.isnull : Alias of isna. DataFrame.notna : Boolean inverse of isna. DataFrame.dropna : Omit axes labels with missing values. isna : Top-level isna.

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool
isnull()

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

DataFrame.isnull : Alias of isna. DataFrame.notna : Boolean inverse of isna. DataFrame.dropna : Omit axes labels with missing values. isna : Top-level isna.

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool
items()

Iterator over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

label : object
The column names for the DataFrame being iterated over.
content : Series
The column entries belonging to each label, as a Series.
DataFrame.iterrows : Iterate over DataFrame rows as
(index, Series) pairs.
DataFrame.itertuples : Iterate over DataFrame rows as namedtuples
of the values.
>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
        species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.iteritems():
...     print('label:', label)
...     print('content:', content, sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
iteritems()

Iterator over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

label : object
The column names for the DataFrame being iterated over.
content : Series
The column entries belonging to each label, as a Series.
DataFrame.iterrows : Iterate over DataFrame rows as
(index, Series) pairs.
DataFrame.itertuples : Iterate over DataFrame rows as namedtuples
of the values.
>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
        species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.iteritems():
...     print('label:', label)
...     print('content:', content, sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
iterrows()

Iterate over DataFrame rows as (index, Series) pairs.

index : label or tuple of label
The index of the row. A tuple for a MultiIndex.
data : Series
The data of the row as a Series.
it : generator
A generator that iterates over the rows of the frame.

itertuples : Iterate over DataFrame rows as namedtuples of the values. iteritems : Iterate over (column name, Series) pairs.

  1. Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
    >>> row = next(df.iterrows())[1]
    >>> row
    int      1.0
    float    1.5
    Name: 0, dtype: float64
    >>> print(row['int'].dtype)
    float64
    >>> print(df['int'].dtype)
    int64
    

    To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally faster than iterrows.

  2. You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

itertuples(index=True, name='Pandas')

Iterate over DataFrame rows as namedtuples.

index : bool, default True
If True, return the index as the first element of the tuple.
name : str or None, default “Pandas”
The name of the returned namedtuples or None to return regular tuples.
collections.namedtuple
Yields a namedtuple for each row in the DataFrame with the first field possibly being the index and following fields being the column values.
DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)
pairs.

DataFrame.iteritems : Iterate over (column name, Series) pairs.

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.

>>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index as the first element of the tuple:

>>> for row in df.itertuples(index=False):
...     print(row)
...
Pandas(num_legs=4, num_wings=0)
Pandas(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded namedtuples:

>>> for row in df.itertuples(name='Animal'):
...     print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)
ix

A primarily label-location based indexer, with integer position fallback.

Warning: Starting in 0.20.0, the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

.ix[] supports mixed integer and label based access. It is primarily label based, but will fall back to integer positional access unless the corresponding axis is of integer type.

.ix is the most general indexer and will support any of the inputs in .loc and .iloc. .ix also supports floating point label schemes. .ix is exceptionally useful when dealing with mixed positional and label based hierarchical indexes.

However, when an axis is integer based, ONLY label based access and not positional access is supported. Thus, in such cases, it’s usually better to be explicit and use .iloc or .loc.

See more at Advanced Indexing.

join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

other : DataFrame, Series, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.
on : str, list of str, or array-like, optional
Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.
how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’

How to handle the operation of the two objects.

  • left: use calling frame’s index (or column if on is specified)
  • right: use other’s index.
  • outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.
  • inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.
lsuffix : str, default ‘’
Suffix to use from left frame’s overlapping columns.
rsuffix : str, default ‘’
Suffix to use from right frame’s overlapping columns.
sort : bool, default False
Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).
DataFrame
A dataframe containing columns from both the caller and other.

DataFrame.merge : For column(s)-on-columns(s) operations.

Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.

Support for specifying index levels as the on parameter was added in version 0.23.0.

>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
...                    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df
  key   A
0  K0  A0
1  K1  A1
2  K2  A2
3  K3  A3
4  K4  A4
5  K5  A5
>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
...                       'B': ['B0', 'B1', 'B2']})
>>> other
  key   B
0  K0  B0
1  K1  B1
2  K2  B2

Join DataFrames using their indexes.

>>> df.join(other, lsuffix='_caller', rsuffix='_other')
  key_caller   A key_other    B
0         K0  A0        K0   B0
1         K1  A1        K1   B1
2         K2  A2        K2   B2
3         K3  A3       NaN  NaN
4         K4  A4       NaN  NaN
5         K5  A5       NaN  NaN

If we want to join using the key columns, we need to set key to be the index in both df and other. The joined DataFrame will have key as its index.

>>> df.set_index('key').join(other.set_index('key'))
      A    B
key
K0   A0   B0
K1   A1   B1
K2   A2   B2
K3   A3  NaN
K4   A4  NaN
K5   A5  NaN

Another option to join using the key columns is to use the on parameter. DataFrame.join always uses other’s index but we can use any column in df. This method preserves the original DataFrame’s index in the result.

>>> df.join(other.set_index('key'), on='key')
  key   A    B
0  K0  A0   B0
1  K1  A1   B1
2  K2  A2   B2
3  K3  A3  NaN
4  K4  A4  NaN
5  K5  A5  NaN
keys()

Get the ‘info axis’ (see Indexing for more)

This is index for Series, columns for DataFrame and major_axis for Panel.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

kurt : Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

kurt : Series or DataFrame (if level specified)

label_point

The point at which the label is located. The labelPoint is always located within or on a feature.

last(offset)

Convenience method for subsetting final periods of time series data based on a date offset.

offset : string, DateOffset, dateutil.relativedelta

subset : same type as caller

TypeError
If the index is not a DatetimeIndex

first : Select initial periods of time series based on a date offset. at_time : Select values at a particular time of the day. between_time : Select values between particular times of the day.

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the last 3 days:

>>> ts.last('3D')
            A
2018-04-13  3
2018-04-15  4

Notice the data for 3 last calender days were returned, not the last 3 observed days in the dataset, and therefore data for 2018-04-11 was not returned.

last_point

The last coordinate of the feature.

last_valid_index()

Return index for last non-NA/null value.

scalar : type of index

If all elements are non-NA/null, returns None. Also returns None for empty NDFrame.

le(other, axis='columns', level=None)

Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
DataFrame of bool
Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
length

The length of the linear feature. Zero for point and multipoint feature types.

length3D

The 3D length of the linear feature. Zero for point and multipoint feature types.

loc

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

  • A list or array of labels, e.g. ['a', 'b', 'c'].

  • A slice object with labels, e.g. 'a':'f'.

    Warning

    Note that contrary to usual python slices, both the start and the stop are included

  • A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

  • A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

See more at Selection by Label

KeyError:
when any items are not found

DataFrame.at : Access a single value for a row/column label pair. DataFrame.iloc : Access group of rows and columns by integer position(s). DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the

Series/DataFrame.

Series.loc : Access group of values using labels.

Getting values

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

Single label. Note this returns the row as a Series.

>>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

List of labels. Note using [[]] returns a DataFrame.

>>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

Single label for row and column

>>> df.loc['cobra', 'shield']
2

Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series

>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series with column labels specified

>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7

Callable that returns a boolean Series

>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8

Setting values

Set value for all items matching the list of labels

>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

Set value for an entire row

>>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

Set value for an entire column

>>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

Set value for rows matching callable condition

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0

Getting values on a DataFrame with an index that has integer labels

Another example using integers for the index

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8

Slice with integer labels for rows. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8

Getting values with a MultiIndex

A number of examples using a DataFrame with a MultiIndex

>>> tuples = [
...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...    ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...         [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Single label. Note this returns a DataFrame with a single index.

>>> df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4

Single index tuple. Note this returns a Series.

>>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this returns a Series.

>>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Single tuple. Note using [[]] returns a DataFrame.

>>> df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4

Single tuple for the index with a single label for the column

>>> df.loc[('cobra', 'mark i'), 'shield']
2

Slice from index tuple to single label

>>> df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Slice from index tuple to index tuple

>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1
lookup(row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame.

Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

row_labels : sequence
The row labels to use for lookup
col_labels : sequence
The column labels to use for lookup

Akin to:

result = [df.get_value(row, col)
          for row, col in zip(row_labels, col_labels)]
values : ndarray
The found values
lt(other, axis='columns', level=None)

Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
DataFrame of bool
Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
mad(axis=None, skipna=None, level=None)

Return the mean absolute deviation of the values for the requested axis.

axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

mad : Series or DataFrame (if level specified)

mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)

Replace values where the condition is True.

cond : boolean NDFrame, array-like, or callable

Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the NDFrame and should return boolean NDFrame or array. The callable must not change input NDFrame (though pandas doesn’t check it).

New in version 0.18.1: A callable can be used as cond.

other : scalar, NDFrame, or callable

Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the NDFrame and should return scalar or NDFrame. The callable must not change input NDFrame (though pandas doesn’t check it).

New in version 0.18.1: A callable can be used as other.

inplace : boolean, default False
Whether to perform the operation in place on the data.
axis : int, default None
Alignment axis if needed.
level : int, default None
Alignment level if needed.
errors : str, {‘raise’, ‘ignore’}, default raise

Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

  • raise : allow exceptions to be raised.
  • ignore : suppress exceptions. On error return original object.
try_cast : boolean, default False
Try to cast the result back to the input type (if possible).
raise_on_error : boolean, default True

Whether to raise on invalid data types (e.g. trying to where on strings).

Deprecated since version 0.21.0: Use errors.

wh : same type as caller

DataFrame.where() : Return an object of same shape as
self.

The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the mask documentation in indexing.

>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
>>> s.mask(s > 0)
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
>>> s.where(s > 1, 10)
0    10
1    10
2    2
3    3
4    4
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> m = df % 3 == 0
>>> df.where(m, -df)
   A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9
>>> df.where(m, -df) == np.where(m, df, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
>>> df.where(m, -df) == df.mask(~m, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the maximum of the values for the requested axis.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

max : Series or DataFrame (if level specified)

Series.sum : Return the sum. Series.min : Return the minimum. Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. DataFrame.min : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. DataFrame.idxmax : Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.max()
8

Max using level names, as well as indices.

>>> s.max(level='blooded')
blooded
warm    4
cold    8
Name: legs, dtype: int64
>>> s.max(level=0)
blooded
warm    4
cold    8
Name: legs, dtype: int64
mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis.

axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

mean : Series or DataFrame (if level specified)

measure_on_line(second_geometry, as_percentage=False)

Returns a measure from the start point of this line to the in_point.

Paramters:
second_geometry:
 
  • a second geometry
as_percentage:
  • If False, the measure will be returned as a

distance; if True, the measure will be returned as a percentage.

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis.

axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

median : Series or DataFrame (if level specified)

melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)

Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

New in version 0.20.0.

frame : DataFrame id_vars : tuple, list, or ndarray, optional

Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_name : scalar
Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.
value_name : scalar, default ‘value’
Name to use for the ‘value’ column.
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.

melt pivot_table DataFrame.pivot

>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
...                    'B': {0: 1, 1: 3, 2: 5},
...                    'C': {0: 2, 1: 4, 2: 6}})
>>> df
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6
>>> df.melt(id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> df.melt(id_vars=['A'], value_vars=['B'],
...         var_name='myVarname', value_name='myValname')
   A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
   A  B  C
   D  E  F
0  a  1  2
1  b  3  4
2  c  5  6
>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
  (A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5
memory_usage(index=True, deep=False)

Return the memory usage of each column in bytes.

The memory usage can optionally include the contribution of the index and elements of object dtype.

This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

index : bool, default True
Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True the memory usage of the index the first item in the output.
deep : bool, default False
If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.
sizes : Series
A Series whose index is the original column names and whose values is the memory usage of each column in bytes.
numpy.ndarray.nbytes : Total bytes consumed by the elements of an
ndarray.

Series.memory_usage : Bytes consumed by a Series. pandas.Categorical : Memory-efficient array for string values with

many repeated values.

DataFrame.info : Concise summary of a DataFrame.

>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000).astype(t))
...              for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
   int64  float64  complex128 object  bool
0      1      1.0      (1+0j)      1  True
1      1      1.0      (1+0j)      1  True
2      1      1.0      (1+0j)      1  True
3      1      1.0      (1+0j)      1  True
4      1      1.0      (1+0j)      1  True
>>> df.memory_usage()
Index            80
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index             80
int64          40000
float64        40000
complex128     80000
object        160000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df['object'].astype('category').memory_usage(deep=True)
5168
merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Merge DataFrame or named Series objects with a database-style join.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

right : DataFrame or named Series
Object to merge with.
how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

Type of merge to be performed.

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
  • right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
  • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
on : label or list
Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
left_on : label or list, or array-like
Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.
right_on : label or list, or array-like
Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.
left_index : bool, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels.
right_index : bool, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index.
sort : bool, default False
Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).
suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
Suffix to apply to overlapping column names in the left and right side, respectively. To raise an exception on overlapping columns use (False, False).
copy : bool, default True
If False, avoid copy if possible.
indicator : bool or str, default False
If True, adds a column to output DataFrame called “_merge” with information on the source of each row. If string, column with information on source of each row will be added to output DataFrame, and column will be named value of string. Information column is Categorical-type and takes on a value of “left_only” for observations whose merge key only appears in ‘left’ DataFrame, “right_only” for observations whose merge key only appears in ‘right’ DataFrame, and “both” if the observation’s merge key is found in both.
validate : str, optional

If specified, checks if merge is of specified type.

  • “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.
  • “one_to_many” or “1:m”: check if merge keys are unique in left dataset.
  • “many_to_one” or “m:1”: check if merge keys are unique in right dataset.
  • “many_to_many” or “m:m”: allowed, but does not result in checks.

New in version 0.21.0.

DataFrame
A DataFrame of the two merged objects.

merge_ordered : Merge with optional filling/interpolation. merge_asof : Merge on nearest keys. DataFrame.join : Similar method using indices.

Support for specifying index levels as the on, left_on, and right_on parameters was added in version 0.23.0 Support for merging named Series objects was added in version 0.24.0

>>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

>>> df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7

Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey',
...           suffixes=('_left', '_right'))
  lkey  value_left rkey  value_right
0  foo           1  foo            5
1  foo           1  foo            8
2  foo           5  foo            5
3  foo           5  foo            8
4  bar           2  bar            6
5  baz           3  baz            7

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
    Index(['value'], dtype='object')
merge_datasets(other)

This operation combines two dataframes into one new DataFrame. If the operation is combining two SpatialDataFrames, the geometry_type must match.

Argument Description
other Required SpatialDataFrame. Another SpatialDataFrame to combine.
Returns:SpatialDataFrame
min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the minimum of the values for the requested axis.

If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.

min : Series or DataFrame (if level specified)

Series.sum : Return the sum. Series.min : Return the minimum. Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. DataFrame.min : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. DataFrame.idxmax : Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.min()
0

Min using level names, as well as indices.

>>> s.min(level='blooded')
blooded
warm    2
cold    0
Name: legs, dtype: int64
>>> s.min(level=0)
blooded
warm    2
cold    0
Name: legs, dtype: int64
mod(other, axis='columns', level=None, fill_value=None)

Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
DataFrame
Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
mode(axis=0, numeric_only=False, dropna=True)

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

The axis to iterate over while searching for the mode:

  • 0 or ‘index’ : get mode of each column
  • 1 or ‘columns’ : get mode of each row
numeric_only : bool, default False
If True, only apply to numeric columns.
dropna : bool, default True

Don’t consider counts of NaN/NaT.

New in version 0.24.0.

DataFrame
The modes of each column or row.

Series.mode : Return the highest frequency value in a Series. Series.value_counts : Return the counts of values in a Series.

>>> df = pd.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
           species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default, missing values are not considered, and the mode of wings are both 0 and 2. The second row of species and legs contains NaN, because they have only one mode, but the DataFrame has two rows.

>>> df.mode()
  species  legs  wings
0    bird   2.0    0.0
1     NaN   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0   2.0    0.0
1   NaN    2.0

To compute the mode over columns and not rows, use the axis parameter:

>>> df.mode(axis='columns', numeric_only=True)
           0    1
falcon   2.0  NaN
horse    4.0  NaN
spider   0.0  8.0
ostrich  2.0  NaN
mul(other, axis='columns', level=None, fill_value=None)

Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
DataFrame
Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
multiply(other, axis='columns', level=None, fill_value=None)

Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
DataFrame
Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
ndim

Return an int representing the number of axes / array dimensions.

Return 1 if Series. Otherwise return 2 if DataFrame.

ndarray.ndim : Number of array dimensions.

>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.ndim
1
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.ndim
2
ne(other, axis='columns', level=None)

Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level : int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
DataFrame of bool
Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
nlargest(n, columns, keep='first')

Return the first n rows ordered by columns in descending order.

Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.

n : int
Number of rows to return.
columns : label or list of labels
Column label(s) to order by.
keep : {‘first’, ‘last’, ‘all’}, default ‘first’

Where there are duplicate values:

  • first : prioritize the first occurrence(s)
  • last : prioritize the last occurrence(s)
  • all : do not drop any duplicates, even it means
    selecting more than n items.

New in version 0.24.0.

DataFrame
The first n rows ordered by the given columns in descending order.
DataFrame.nsmallest : Return the first n rows ordered by columns in
ascending order.

DataFrame.sort_values : Sort DataFrame by the values. DataFrame.head : Return the first n rows without re-ordering.

This function cannot be used with all column types. For example, when specifying columns with object or category dtypes, TypeError is raised.

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 11300,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru          11300      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nlargest to select the three rows having the largest values in column “population”.

>>> df.nlargest(3, 'population')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Malta       434000    12011      MT

When using keep='last', ties are resolved in reverse order:

>>> df.nlargest(3, 'population', keep='last')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN

When using keep='all', all duplicate items are maintained:

>>> df.nlargest(3, 'population', keep='all')
          population      GDP alpha-2
France      65000000  2583560      FR
Italy       59000000  1937894      IT
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN

To order by the largest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nlargest(3, ['population', 'GDP'])
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN
notna()

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

DataFrame.notnull : Alias of notna. DataFrame.isna : Boolean inverse of notna. DataFrame.dropna : Omit axes labels with missing values. notna : Top-level notna.

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool
notnull()

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

DataFrame.notnull : Alias of notna. DataFrame.isna : Boolean inverse of notna. DataFrame.dropna : Omit axes labels with missing values. notna : Top-level notna.

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool
nsmallest(n, columns, keep='first')

Return the first n rows ordered by columns in ascending order.

Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant.

n : int
Number of items to retrieve.
columns : list or str
Column name or names to order by.
keep : {‘first’, ‘last’, ‘all’}, default ‘first’

Where there are duplicate values:

  • first : take the first occurrence.
  • last : take the last occurrence.
  • all : do not drop any duplicates, even it means selecting more than n items.

New in version 0.24.0.

DataFrame

DataFrame.nlargest : Return the first n rows ordered by columns in
descending order.

DataFrame.sort_values : Sort DataFrame by the values. DataFrame.head : Return the first n rows without re-ordering.

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 11300,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru          11300      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nsmallest to select the three rows having the smallest values in column “a”.

>>> df.nsmallest(3, 'population')
          population  GDP alpha-2
Nauru          11300  182      NR
Tuvalu         11300   38      TV
Anguilla       11300  311      AI

When using keep='last', ties are resolved in reverse order:

>>> df.nsmallest(3, 'population', keep='last')
          population  GDP alpha-2
Anguilla       11300  311      AI
Tuvalu         11300   38      TV
Nauru          11300  182      NR

When using keep='all', all duplicate items are maintained:

>>> df.nsmallest(3, 'population', keep='all')
          population  GDP alpha-2
Nauru          11300  182      NR
Tuvalu         11300   38      TV
Anguilla       11300  311      AI

To order by the largest values in column “a” and then “c”, we can specify multiple columns like in the next example.

>>> df.nsmallest(3, ['population', 'GDP'])
          population  GDP alpha-2
Tuvalu         11300   38      TV
Nauru          11300  182      NR
Anguilla       11300  311      AI
nunique(axis=0, dropna=True)

Count distinct observations over requested axis.

Return Series with number of distinct observations. Can ignore NaN values.

New in version 0.20.0.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
dropna : bool, default True
Don’t include NaN in the counts.

nunique : Series

Series.nunique: Method nunique for Series. DataFrame.count: Count non-NA cells for each column or row.

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 1, 1]})
>>> df.nunique()
A    3
B    1
dtype: int64
>>> df.nunique(axis=1)
0    1
1    2
2    2
dtype: int64
overlaps(second_geometry)

Indicates if the intersection of the two geometries has the same shape type as one of the input geometries and is not equivalent to either of the input geometries.

Paramters:
second_geometry:
 
  • a second geometry
part_count

The number of geometry parts for the feature.

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)

Percentage change between the current and a prior element.

Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.

periods : int, default 1
Periods to shift for forming percent change.
fill_method : str, default ‘pad’
How to handle NAs before computing percent changes.
limit : int, default None
The number of consecutive NAs to fill before stopping.
freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay()).
**kwargs
Additional keyword arguments are passed into DataFrame.shift or Series.shift.
chg : Series or DataFrame
The same type as the calling object.

Series.diff : Compute the difference of two elements in a Series. DataFrame.diff : Compute the difference of two elements in a DataFrame. Series.shift : Shift the index by some number of periods. DataFrame.shift : Shift the index by some number of periods.

Series

>>> s = pd.Series([90, 91, 85])
>>> s
0    90
1    91
2    85
dtype: int64
>>> s.pct_change()
0         NaN
1    0.011111
2   -0.065934
dtype: float64
>>> s.pct_change(periods=2)
0         NaN
1         NaN
2   -0.055556
dtype: float64

See the percentage change in a Series where filling NAs with last valid observation forward to next valid.

>>> s = pd.Series([90, 91, None, 85])
>>> s
0    90.0
1    91.0
2     NaN
3    85.0
dtype: float64
>>> s.pct_change(fill_method='ffill')
0         NaN
1    0.011111
2    0.000000
3   -0.065934
dtype: float64

DataFrame

Percentage change in French franc, Deutsche Mark, and Italian lira from 1980-01-01 to 1980-03-01.

>>> df = pd.DataFrame({
...     'FR': [4.0405, 4.0963, 4.3149],
...     'GR': [1.7246, 1.7482, 1.8519],
...     'IT': [804.74, 810.01, 860.13]},
...     index=['1980-01-01', '1980-02-01', '1980-03-01'])
>>> df
                FR      GR      IT
1980-01-01  4.0405  1.7246  804.74
1980-02-01  4.0963  1.7482  810.01
1980-03-01  4.3149  1.8519  860.13
>>> df.pct_change()
                  FR        GR        IT
1980-01-01       NaN       NaN       NaN
1980-02-01  0.013810  0.013684  0.006549
1980-03-01  0.053365  0.059318  0.061876

Percentage of change in GOOG and APPL stock volume. Shows computing the percentage change between columns.

>>> df = pd.DataFrame({
...     '2016': [1769950, 30586265],
...     '2015': [1500923, 40912316],
...     '2014': [1371819, 41403351]},
...     index=['GOOG', 'APPL'])
>>> df
          2016      2015      2014
GOOG   1769950   1500923   1371819
APPL  30586265  40912316  41403351
>>> df.pct_change(axis='columns')
      2016      2015      2014
GOOG   NaN -0.151997 -0.086016
APPL   NaN  0.337604  0.012002
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

func : function
function to apply to the NDFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the NDFrame.
args : iterable, optional
positional arguments passed into func.
kwargs : mapping, optional
a dictionary of keyword arguments passed into func.

object : the return type of func.

DataFrame.apply DataFrame.applymap Series.map

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> f(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(f, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((f, 'arg2'), arg1=a, arg3=c)
...  )
pivot(index=None, columns=None, values=None)

Return reshaped DataFrame organized by given index / column values.

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the User Guide for more on reshaping.

index : string or object, optional
Column to use to make new frame’s index. If None, uses existing index.
columns : string or object
Column to use to make new frame’s columns.
values : string, object or a list of the previous, optional

Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

Changed in version 0.23.0: Also accept list of column names.

DataFrame
Returns reshaped DataFrame.
ValueError:
When there are any index, columns combinations with multiple values. DataFrame.pivot_table when you need to aggregate.
DataFrame.pivot_table : Generalization of pivot that can handle
duplicate values for one index/column pair.
DataFrame.unstack : Pivot based on the index values instead of a
column.

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods.

>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index='foo', columns='bar')['baz']
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
      baz       zoo
bar   A  B  C   A  B  C
foo
one   1  2  3   x  y  z
two   4  5  6   q  w  t

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
...                    "bar": ['A', 'A', 'B', 'C'],
...                    "baz": [1, 2, 3, 4]})
>>> df
   foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4

Notice that the first two rows are the same for our index and columns arguments.

>>> df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):
   ...
ValueError: Index contains duplicate entries, cannot reshape
pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

values : column to aggregate, optional index : column, Grouper, array, or list of the previous

If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.
columns : column, Grouper, array, or list of the previous
If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.
aggfunc : function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions
fill_value : scalar, default None
Value to replace missing values with
margins : boolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
margins_name : string, default ‘All’
Name of the row / column that will contain the totals when margins is True.

table : DataFrame

DataFrame.pivot : Pivot without aggregation that can handle
non-numeric data.
>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": <