arcgis.features module

The arcgis.features module contains types and functions for working with features and feature layers in the GIS.

Entities located in space with a geometrical representation (such as points, lines or polygons) and a set of properties can be represented as features. The arcgis.features module is used for working with feature data, feature layers and collections of feature layers in the GIS. It also contains the spatial analysis functions which operate against feature data.

In the GIS, entities located in space with a set of properties can be represented as features. Features are stored as feature classes, which represent a set of features located using a single spatial type (point, line, polygon) and a common set of properties. This is the geographic extension of the classic tabular or relational representation for entities - a set of entities is modelled as rows in a table. Tables represent entity classes with uniform properties. In addition to working with entities with location as features, the system can also work with non-spatial entities as rows in tables. The system can also model relationships between entities using properties which act as primary and foreign keys. A collection of feature classes and tables, with the associated relationships among the entities, is a feature layer collection. FeatureLayerCollections are one of the dataset types contained in a Datastore. Finally, features are not simply entities in a dataset. Features have a visual representation and user experience - on a map, in a 3D scene, as entities with a property sheet or popups.

arcgis.features.Feature

class arcgis.features.Feature(geometry=None, attributes=None)

Entities located in space with a set of properties can be represented as features.

# Obtain a feature from a feature layer:

feat_set = feature_layer.query(where="OBJECTID=1")
feat = feat_set[0]
property as_dict
Returns

the feature as a dictionary

property as_row
Returns

the feature as a tuple containing two lists:

List of:

Description

row values

the specific attribute values and geometry for this feature

field names

the name for each attribute field

property attributes
Returns

a dictionary of feature attribute values with field names as the key

property fields
Returns

attribute field names for the feature as a list of strings

classmethod from_dict(feature, sr=None)
Returns

a feature from a dict

classmethod from_json(json_str)
Returns

a feature from a JSON string

property geometry
Returns

the feature geometry

property geometry_type
Returns

the geometry type of the feature as a string

get_value(field_name)

Retrieves the value for a specified field name

Argument

Description

field name

Required String. The name of the field to get the value for.

feature.fields will return a list of all field names.
Returns

value for the specified attribute field of the feature.

set_value(field_name, value)

Sets an attribute value for a given field name

Argument

Description

field_name

Required String. The name of the field to update.

value

Required. Value to update the field with.

Returns

boolean indicating whether field_name value was updated.

arcgis.features.FeatureLayer

class arcgis.features.FeatureLayer(url, gis=None, container=None, dynamic_layer=None)

The feature layer is the primary concept for working with features in a GIS.

Users create, import, export, analyze, edit, and visualize features, i.e. entities in space as feature layers.

Feature layers can be added to and visualized using maps. They act as inputs to and outputs from feature analysis tools.

Feature layers are created by publishing feature data to a GIS, and are exposed as a broader resource (Item) in the GIS. Feature layer objects can be obtained through the layers attribute on feature layer Items in the GIS.

append(item_id=None, upload_format='featureCollection', source_table_name=None, field_mappings=None, edits=None, source_info=None, upsert=True, skip_updates=False, use_globalids=False, update_geometry=True, append_fields=None, rollback=False, skip_inserts=None, upsert_matching_field=None)

Only available in ArcGIS Online

Update an existing hosted feature layer using append.

Argument

Description

source_table_name

optional string. Required only when the source data contains more than one tables, e.g., for file geodatabase. Example: source_table_name= “Building”

item_id

optional string. The ID for the Portal item that contains the source file. Used in conjunction with editsUploadFormat.

field_mappings

optional list. Used to map source data to a destination layer. Syntax: fieldMappings=[{“name” : <”targerName”>,

“sourceName” : < “sourceName”>}, …]

Examples: fieldMappings=[{“name”“CountyID”,

“sourceName” : “GEOID10”}]

edits

optional string. Only feature collection json is supported. Append supports all format through the upload_id or item_id.

source_info

optional dictionary. This is only needed when appending data from excel or csv. The appendSourceInfo can be the publishing parameter returned from analyze the csv or excel file.

upsert

optional boolean. Optional parameter specifying whether the edits needs to be applied as updates if the feature already exists. Default is false.

skip_updates

Optional boolean. Parameter is used only when upsert is true.

use_globalids

Optional boolean. Specifying whether upsert needs to use GlobalId when matching features.

update_geometry

Optional boolean. The parameter is used only when upsert is true. Skip updating the geometry and update only the attributes for existing features if they match source features by objectId or globalId.(as specified by useGlobalIds parameter).

append_fields

Optional list. The list of destination fields to append to. This is supported when upsert=true or false. Values: [“fieldName1”, “fieldName2”,….]

upload_format

required string. The source append data format. The default is featureCollection format. Values: sqlite | shapefile | filegdb | featureCollection | geojson | csv | excel

rollback

Optional boolean. Optional parameter specifying whether the upsert edits needs to be rolled back in case of failure. Default is false.

skip_inserts

Used only when upsert is true. Used to skip inserts if the value is true. The default value is false.

upsert_matching_field

Optional string. The layer field to be used when matching features with upsert. ObjectId, GlobalId, and any other field that has a unique index can be used with upsert. This parameter overrides use_globalids; e.g., specifying upsert_matching_field will be used even if you specify use_globalids = True. Example: upsert_matching_field=”MyfieldWithUniqueIndex”

Returns

boolean

calculate(where, calc_expression, sql_format='standard', version=None, sessionid=None, return_edit_moment=None, future=False)

The calculate operation is performed on a feature layer resource. It updates the values of one or more fields in an existing feature service layer based on SQL expressions or scalar values. The calculate operation can only be used if the supportsCalculate property of the layer is true. Neither the Shape field nor system fields can be updated using calculate. System fields include ObjectId and GlobalId. See Calculate a field for more information on supported expressions

Inputs

Description

where

Required String. A where clause can be used to limit the updated records. Any legal SQL where clause operating on the fields in the layer is allowed.

calc_expression

Required List. The array of field/value info objects that contain the field or fields to update and their scalar values or SQL expression. Allowed types are dictionary and list. List must be a list of dictionary objects.

Calculation Format is as follows:

{“field” : “<field name>”, “value” : “<value>”}

sql_format

Optional String. The SQL format for the calc_expression. It can be either standard SQL92 (standard) or native SQL (native). The default is standard.

Values: standard, native

version

Optional String. The geodatabase version to apply the edits.

sessionid

Optional String. A parameter which is set by a client during long transaction editing on a branch version. The sessionid is a GUID value that clients establish at the beginning and use throughout the edit session. The sessonid ensures isolation during the edit session. This parameter applies only if the isDataBranchVersioned property of the layer is true.

return_edit_moment

Optional Boolean. This parameter specifies whether the response will report the time edits were applied. If true, the server will return the time edits were applied in the response’s edit moment key. This parameter applies only if the isDataBranchVersioned property of the layer is true.

future

Optional Boolean. If True, the result is returned as a future object and the results are obtained in an asynchronous fashion. False is the default.

This applies to 10.8+ only

# Usage Example 1:

print(fl.calculate(where="OBJECTID < 2",
                   calc_expression={"field": "ZONE", "value" : "R1"}))
# Usage Example 2:

print(fl.calculate(where="OBJECTID < 2001",
                   calc_expression={"field": "A",  "sqlExpression" : "B*3"}))

Output: dictionary with format {‘updatedFeatureCount’: 1, ‘success’: True}

property container

The feature layer collection to which this layer belongs.

delete_features(deletes=None, where=None, geometry_filter=None, gdb_version=None, rollback_on_failure=True, return_delete_results=True, future=False)

This operation deletes features in a feature layer or table

Argument

Description

deletes

Optional string. A comma seperated string of OIDs to remove from the service.

where

Optional string. A where clause for the query filter. Any legal SQL where clause operating on the fields in the layer is allowed. Features conforming to the specified where clause will be deleted.

geometry_filter

Optional SpatialFilter. A spatial filter from arcgis.geometry.filters module to filter results by a spatial relationship with another geometry.

gdb_version

Optional string. A Geodatabase version to apply the edits.

rollback_on_failure

Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.

return_delete_results

Optional Boolean. Optional parameter that indicates whether a result is returned per deleted row when the deleteFeatures operation is run. The default is true.

future

Optional Boolean. If future=True, then the operation will occur asynchronously else the operation will occur synchronously. False is the default.

Returns

Dict if future=False (default), else a concurrent.Future class.

edit_features(adds=None, updates=None, deletes=None, gdb_version=None, use_global_ids=False, rollback_on_failure=True, return_edit_moment=False, attachments=None, true_curve_client=False, session_id=None, use_previous_moment=False, datum_transformation=None)

This operation adds, updates, and deletes features to the associated feature layer or table in a single call.

When making large number (250+ records at once) of edits, append should be used over edit_features to improve performance and ensure service stability.

Inputs

Description

adds

Optional FeatureSet/List. The array of features to be added.

updates

Optional FeatureSet/List. The array of features to be updated.

deletes

Optional FeatureSet/List. string of OIDs to remove from service

use_global_ids

Optional boolean. Instead of referencing the default Object ID field, the service will look at a GUID field to track changes. This means the GUIDs will be passed instead of OIDs for delete, update or add features.

gdb_version

Optional boolean. Geodatabase version to apply the edits.

rollback_on_failure

Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.

return_edit_moment

Optional boolean. Introduced at 10.5, only applicable with ArcGIS Server services only. Specifies whether the response will report the time edits were applied. If set to true, the server will return the time in the response’s editMoment key. The default value is false.

attachments

Optional Dict. This parameter adds, updates, or deletes attachments. It applies only when the use_global_ids parameter is set to true. For adds, the globalIds of the attachments provided by the client are preserved. When useGlobalIds is true, updates and deletes are identified by each feature or attachment globalId, rather than their objectId or attachmentId. This parameter requires the layer’s supportsApplyEditsWithGlobalIds property to be true.

Attachments to be added or updated can use either pre-uploaded data or base 64 encoded data.

Inputs

Inputs

Description

adds

List of attachments to add.

updates

List of attachements to update

deletes

List of attachments to delete

Additional attachment information here.

true_curve_client

Optional boolean. Introduced at 10.5. Indicates to the server whether the client is true curve capable. When set to true, this indicates to the server that true curve geometries should be downloaded and that geometries containing true curves should be consumed by the map service without densifying it. When set to false, this indicates to the server that the client is not true curves capable. The default value is false.

session_id

Optional String. Introduced at 10.6. The session_id is a GUID value that clients establish at the beginning and use throughout the edit session. The sessonID ensures isolation during the edit session. The session_id parameter is set by a client during long transaction editing on a branch version.

use_previous_moment

Optional Boolean. Introduced at 10.6. The use_previous_moment parameter is used to apply the edits with the same edit moment as the previous set of edits. This allows an editor to apply single block of edits partially, complete another task and then complete the block of edits. This parameter is set by a client during long transaction editing on a branch version.

When set to true, the edits are applied with the same edit moment as the previous set of edits. When set to false or not set (default) the edits are applied with a new edit moment.

datum_transformation

Optional Integer/Dictionary. This parameter applies a datum transformation while projecting geometries in the results when out_sr is different than the layer’s spatial reference. When specifying transformations, you need to think about which datum transformation best projects the layer (not the feature service) to the outSR and sourceSpatialReference property in the layer properties. For a list of valid datum transformation ID values ad well-known text strings, see Coordinate systems and transformations. For more information on datum transformations, please see the transformation parameter in the Project operation.

Examples

Inputs

Description

WKID

Integer. Ex: datum_transformation=4326

WKT

Dict. Ex: datum_transformation={“wkt”: “<WKT>”}

Composite

Dict. Ex: datum_transformation=```{‘geoTransforms’:[{‘wkid’:<id>,’forward’:<true|false>},{‘wkt’:’<WKT>’,’forward’:<True|False>}]}```

Output: dictionary

export_attachments(output_folder, label_field=None)

Exports attachments from the feature layer in Imagenet format using the output_label_field.

Argument

Description

output_folder

Required. Output folder where the attachments will be stored.

label_field

Optional. Field which contains the label/category of each feature. If None, a default folder is created.

classmethod fromitem(item, layer_id=0)

Creates a feature layer from a GIS Item. The type of item should be a ‘Feature Service’ that represents a FeatureLayerCollection. The layer_id is the id of the layer in feature layer collection (feature service).

generate_renderer(definition, where=None)

This operation groups data using the supplied definition (classification definition) and an optional where clause. The result is a renderer object. Use baseSymbol and colorRamp to define the symbols assigned to each class. If the operation is performed on a table, the result is a renderer object containing the data classes and no symbols.

Argument

Description

definition

required dict. The definition using the renderer that is generated. Use either class breaks or unique value classification definitions. See: https://resources.arcgis.com/en/help/rest/apiref/ms_classification.html

where

optional string. A where clause for which the data needs to be classified. Any legal SQL where clause operating on the fields in the dynamic layer/table is allowed.

Returns

dictionary

get_html_popup(oid)

The htmlPopup resource provides details about the HTML pop-up authored by the user using ArcGIS for Desktop.

Argument

Description

oid

Optional string. Object id of the feature to get the HTML popup.

Returns

string

get_unique_values(attribute, query_string='1=1')

Return a list of unique values for a given attribute

Argument

Description

attribute

Required string. The feature layer attribute to query.

query_string

Optional string. SQL Query that will be used to filter attributes before unique values are returned. ex. “name_2 like ‘%K%’”

property manager

Helper object to manage the feature layer, update it’s definition, etc

property metadata

The metadata property allows for the setting and downloading of the Feature Layer’s metadata. If metadata is disabled on the GIS or the layer does not support metdata, None value will be returned.

Argument

Description

value

Required String. (SET) Path to the metadata file.

Returns

String (GET)

property properties

The properties of this object

query(where='1=1', out_fields='*', time_filter=None, geometry_filter=None, return_geometry=True, return_count_only=False, return_ids_only=False, return_distinct_values=False, return_extent_only=False, group_by_fields_for_statistics=None, statistic_filter=None, result_offset=None, result_record_count=None, object_ids=None, distance=None, units=None, max_allowable_offset=None, out_sr=None, geometry_precision=None, gdb_version=None, order_by_fields=None, out_statistics=None, return_z=False, return_m=False, multipatch_option=None, quantization_parameters=None, return_centroid=False, return_all_records=True, result_type=None, historic_moment=None, sql_format=None, return_true_curves=False, return_exceeded_limit_features=None, as_df=False, datum_transformation=None, **kwargs)

Queries a feature layer based on a sql statement

Argument

Description

where

Optional string. The default is 1=1. The selection sql statement.

out_fields

Optional List of field names to return. Field names can be specified either as a List of field names or as a comma separated string. The default is “*”, which returns all the fields.

object_ids

Optional string. The object IDs of this layer or table to be queried. The object ID values should be a comma-separated string.

distance

Optional integer. The buffer distance for the input geometries. The distance unit is specified by units. For example, if the distance is 100, the query geometry is a point, units is set to meters, and all points within 100 meters of the point are returned.

units

Optional string. The unit for calculating the buffer distance. If unit is not specified, the unit is derived from the geometry spatial reference. If the geometry spatial reference is not specified, the unit is derived from the feature service data spatial reference. This parameter only applies if supportsQueryWithDistance is true. Values: esriSRUnit_Meter | esriSRUnit_StatuteMile |

esriSRUnit_Foot | esriSRUnit_Kilometer | esriSRUnit_NauticalMile | esriSRUnit_USNauticalMile

time_filter

Optional list. The format is of [<startTime>, <endTime>] using datetime.date, datetime.datetime or timestamp in milliseconds. Syntax: time_filter=[<startTime>, <endTime>] ; specified as

datetime.date, datetime.datetime or timestamp in milliseconds

geometry_filter

Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.

max_allowable_offset

Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of out_sr. If out_sr is not specified, max_allowable_offset is assumed to be in the unit of the spatial reference of the layer.

out_sr

Optional Integer. The WKID for the spatial reference of the returned geometry.

geometry_precision

Optional Integer. This option can be used to specify the number of decimal places in the response geometries returned by the query operation. This applies to X and Y values only (not m or z-values).

gdb_version

Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer is true. If this is not specified, the query will apply to the published map’s version.

return_geometry

Optional boolean. If true, geometry is returned with the query. Default is true.

return_distinct_values

Optional boolean. If true, it returns distinct values based on the fields specified in out_fields. This parameter applies only if the supportsAdvancedQueries property of the layer is true.

return_ids_only

Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.

return_count_only

Optional boolean. If true, the response only includes the count (number of features/records) that would be returned by a query. Otherwise, the response is a feature set. The default is false. This option supersedes the returnIdsOnly parameter. If returnCountOnly = true, the response will return both the count and the extent.

return_extent_only

Optional boolean. If true, the response only includes the extent of the features that would be returned by the query. If returnCountOnly=true, the response will return both the count and the extent. The default is false. This parameter applies only if the supportsReturningQueryExtent property of the layer is true.

order_by_fields

Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER

group_by_fields_for_statistics

Optional string. One or more field names on which the values need to be grouped for calculating the statistics. example: STATE_NAME, GENDER

out_statistics

Optional string. The definitions for one or more field-based statistics to be calculated.

Syntax:

[
{

“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field1”, “outStatisticFieldName”: “Out_Field_Name1”

}, {

“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field2”, “outStatisticFieldName”: “Out_Field_Name2”

}

]

return_z

Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is False.

return_m

Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.

multipatch_option

Optional x/y footprint. This option dictates how the geometry of a multipatch feature will be returned.

result_offset

Optional integer. This option can be used for fetching query results by skipping the specified number of records and starting from the next record (that is, resultOffset + 1th). This option is ignored if return_all_records is True (i.e. by default).

result_record_count

Optional integer. This option can be used for fetching query results up to the result_record_count specified. When result_offset is specified but this parameter is not, the map service defaults it to max_record_count. The maximum value for this parameter is the value of the layer’s max_record_count property. This option is ignored if return_all_records is True (i.e. by default).

quantization_parameters

Optional dict. Used to project the geometry onto a virtual grid, likely representing pixels on the screen.

return_centroid

Optional boolean. Used to return the geometry centroid associated with each feature returned. If true, the result includes the geometry centroid. The default is false.

return_all_records

Optional boolean. When True, the query operation will call the service until all records that satisfy the where_clause are returned. Note: result_offset and result_record_count will be ignored if return_all_records is True. Also, if return_count_only, return_ids_only, or return_extent_only are True, this parameter will be ignored.

result_type

Optional string. The result_type parameter can be used to control the number of features returned by the query operation. Values: None | standard | tile

historic_moment

Optional integer. The historic moment to query. This parameter applies only if the layer is archiving enabled and the supportsQueryWithHistoricMoment property is set to true. This property is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

sql_format

Optional string. The sql_format parameter can be either standard SQL92 standard or it can use the native SQL of the underlying datastore native. The default is none which means the sql_format depends on useStandardizedQuery parameter. Values: none | standard | native

return_true_curves

Optional boolean. When set to true, returns true curves in output geometries. When set to false, curves are converted to densified polylines or polygons.

return_exceeded_limit_features

Optional boolean. Optional parameter which is true by default. When set to true, features are returned even when the results include ‘exceededTransferLimit’: True.

When set to false and querying with resultType = tile features are not returned when the results include ‘exceededTransferLimit’: True. This allows a client to find the resolution in which the transfer limit is no longer exceeded without making multiple calls.

as_df

Optional boolean. If True, the results are returned as a DataFrame instead of a FeatureSet.

datum_transformation

Optional Integer/Dictionary. This parameter applies a datum transformation while projecting geometries in the results when out_sr is different than the layer’s spatial reference. When specifying transformations, you need to think about which datum transformation best projects the layer (not the feature service) to the outSR and sourceSpatialReference property in the layer properties. For a list of valid datum transformation ID values ad well-known text strings, see Coordinate systems and transformations. For more information on datum transformations, please see the transformation parameter in the Project operation.

Examples

Inputs

Description

WKID

Integer. Ex: datum_transformation=4326

WKT

Dict. Ex: datum_transformation={“wkt”: “<WKT>”}

Composite

Dict. Ex: datum_transformation=```{‘geoTransforms’:[{‘wkid’:<id>,’forward’:<true|false>},{‘wkt’:’<WKT>’,’forward’:<True|False>}]}```

kwargs

Optional dict. Optional parameters that can be passed to the Query function. This will allow users to pass additional parameters not explicitly implemented on the function. A complete list of functions available is documented on the Query REST API.

Returns

A FeatureSet containing the features matching the query unless another return type is specified, such as count

query_analytics(out_analytics, where='1=1', out_fields='*', analytic_where=None, geometry_filter=None, out_sr=None, return_geometry=True, order_by=None, result_type=None, cache_hint=None, result_offset=None, result_record_count=None, quantization_param=None, sql_format=None, future=True, **kwargs)

query_analytics exposes the standard SQL windows functions that compute aggregate and ranking values based on a group of rows called window partition. The window function is applied to the rows after the partitioning and ordering of the rows. query_analytics defines a window or user-specified set of rows within a query result set. query_analytics can be used to compute aggregated values such as moving averages, cumulative aggregates, or running totals.

SQL Windows Function

A window function performs a calculation across a set of rows (SQL partition or window) that are related to the current row. Unlike regular aggregate functions, use of a window function does not return single output row. The rows retain their separate identities with each calculation appended to the rows as a new field value. The window function can access more than just the current row of the query result.

query_analytics currently supports the following windows functions:
  • Aggregate functions

  • Analytic functions

  • Ranking functions

Aggregate Functions

Aggregate functions are deterministic function that perform a calculation on a set of values and return a single value. They are used in the select list with optional HAVING clause. GROUP BY clause can also be used to calculate the aggregation on categories of rows. query_analytics can be used to calculate the aggregation on a specific range of value. Supported aggregate functions are:

  • Min

  • Max

  • Sum

  • Count

  • AVG

  • STDDEV

  • VAR

Analytic Functions

Several analytic functions available now in all SQL vendors to compute an aggregate value based on a group of rows or windows partition. Unlike aggregation functions, analytic functions can return single or multiple rows for each group.

  • CUM_DIST

  • FIRST_VALUE

  • LAST_VALUE

  • LEAD

  • LAG

  • PERCENTILE_DISC

  • PERCENTILE_CONT

  • PERCENT_RANK

Ranking Functions

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows.

  • RANK

  • NTILE

  • DENSE_RANK

  • ROW_NUMBER

Partitioning

Partitions are extremely useful when you need to calculate the same metric over different group of rows. It is very powerful and has many potential usages. For example, you can add partition by to your window specification to look at different groups of rows individually.

‘partitionBy’ clause divides the query result set into partitions and the sql window function is applied to each partition. The ‘partitionBy’ clause normally refers to the column by which the result is partitioned. ‘partitionBy’ can also be a value expression (column expression or function) that references any of the selected columns (not aliases).

Argument

Description

out_analytics

Required List. A set of analytics to calculate on the Feature Layer.

The definitions for one or more field-based or expression analytics to be computed. This parameter is supported only on layers/tables that indicate supportsAnalytics is true. Note: If outAnalyticFieldName is empty or missing, the server assigns a field name to the returned analytic field.

Syntax: An array of analytic definitions. An analytic definition specifies the type of analytic, the field or expression on which it is to be computed, and the resulting output field name. Syntax [

{

“analyticType”: “<COUNT | SUM | MIN | MAX | AVG | STDDEV | VAR | FIRST_VALUE, LAST_VALUE, LAG, LEAD, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, RANK, NTILE, DENSE_RANK, EXPRESSION>”, “onAnalyticField”: “Field1”, “outAnalyticFieldName”: “Out_Field_Name1”,

“analyticParameters”: {

“orderBy”: “<orderBy expression”, “value”: <double value>,// percentile value “partitionBy”: “<field name or expression>”, “offset”: <integer>, // used by LAG/LEAD “windowFrame”: {

“type”: “ROWS” | “RANGE”, “extent”: {

“extentType”: “PRECEDING” | “BOUNDARY”, “PRECEDING”: {

“type”: <”UNBOUNDED” |
“NUMERIC_CONSTANT” |

“CURRENT_ROW”>

“value”: <numeric constant value>

} “BOUNDARY”: {

“start”: “UNBOUNDED_PRECEDING”,
“NUMERIC_PRECEDING”,

“CURRENT_ROW”,

“startValue”: <numeric constant value>, “end”: <”UNBOUNDED_FOLLOWING” |

“NUMERIC_FOLLOWING” | “CURRENT_ROW”,

“endValue”: <numeric constant value>

}

}

}

}

}

}

]

Example: [{

“analyticType”: “FIRST_VALUE”, “onAnalyticField”: “POP1990”, “analyticParameters”: {

“orderBy”: “POP1990”, “partitionBy”: “state_name”

}, “outAnalyticFieldName”: “FirstValue”

}

]

where

Optional string. The default is 1=1. The selection sql statement.

out_fields

Optional List of field names to return. Field names can be specified either as a List of field names or as a comma separated string. The default is “*”, which returns all the fields.

analytic_where

Optional String. A where clause for the query filter that applies to the result set of applying the source where clause and all other params.

geometry_filter

Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.

out_sr

Optional Integer. The WKID for the spatial reference of the returned geometry.

out_sr

Optional Integer. The output spatial reference wkid.

return_geometry

Optional boolean. If true, geometry is returned with the query. Default is true.

order_by

Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER

result_type

Optional string. The result_type parameter can be used to control the number of features returned by the query operation. Values: None | standard | tile

cache_hint

Optional Boolean. If you are performing the same query multiple times, a user can ask the server to cache the call to obtain the results quicker. The default is False.

result_offset

Optional integer. This option can be used for fetching query results by skipping the specified number of records and starting from the next record (that is, resultOffset + 1th).

result_record_count

Optional integer. This option can be used for fetching query results up to the result_record_count specified. When result_offset is specified but this parameter is not, the map service defaults it to max_record_count. The maximum value for this parameter is the value of the layer’s max_record_count property.

quantization_parameters

Optional dict. Used to project the geometry onto a virtual grid, likely representing pixels on the screen.

sql_format

Optional string. The sql_format parameter can be either standard SQL92 standard or it can use the native SQL of the underlying datastore native. The default is none which means the sql_format depends on useStandardizedQuery parameter. Values: none | standard | native

future

Optional Boolean. This determines if a Future object is returned (True) the method returns the results directly (False).

Returns

pd.DataFrame

The Query operation is performed on a feature service layer resource. The result of this operation are feature sets grouped by source layer/table object IDs. Each feature set contains Feature objects including the values for the fields requested by the user. For related layers, if you request geometry information, the geometry of each feature is also returned in the feature set. For related tables, the feature set does not include geometries.

Argument

Description

object_ids

Required string. The object IDs of the table/layer to be queried

relationship_id

Required string. The ID of the relationship to be queried.

out_fields

Required string. the list of fields from the related table/layer to be included in the returned feature set. This list is a comma delimited list of field names. If you specify the shape field in the list of return fields, it is ignored. To request geometry, set return_geometry to true. You can also specify the wildcard “*” as the value of this parameter. In this case, the results will include all the field values.

definition_expression

Optional string. The definition expression to be applied to the related table/layer. From the list of objectIds, only those records that conform to this expression are queried for related records.

return_geometry

Optional boolean. If true, the feature set includes the geometry associated with each feature. The default is true.

max_allowable_offset

Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of the outSR. If out_wkid is not specified, then max_allowable_offset is assumed to be in the unit of the spatial reference of the map.

geometry_precision

Optional integer. This option can be used to specify the number of decimal places in the response geometries.

out_wkid

Optional Integer. The spatial reference of the returned geometry.

gdb_version

Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer queried is true.

return_z

Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is false.

return_m

Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.

historic_moment

Optional Integer/datetime. The historic moment to query. This parameter applies only if the supportsQueryWithHistoricMoment property of the layers being queried is set to true. This setting is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

Syntax: historic_moment=<Epoch time in milliseconds>

return_true_curves

Optional boolean. Optional parameter that is false by default. When set to true, returns true curves in output geometries; otherwise, curves are converted to densified polylines or polygons.

Returns

dict

query_top_features(top_filter=None, where=None, objectids=None, start_time=None, end_time=None, geometry_filter=None, out_fields='*', return_geometry=True, return_centroid=False, max_allowable_offset=None, out_sr=None, geometry_precision=None, return_ids_only=False, return_extents_only=False, order_by_field=None, return_z=False, return_m=False, result_type=None, as_df=True)

The query_top_features is performed on a feature layer. This operation returns a feature set or spatially enabled dataframe based on the top features by order within a group. For example, when querying counties in the United States, you want to return the top five counties by population in each state. To do this, you can use query_top_feaures to group by state name, order by desc on the population and return the first five rows from each group (state).

The top_filter parameter is used to set the group by, order by, and count criteria used in generating the result. The operation also has many of the same parameters (for example, where and geometry) as the layer query operation. However, unlike the layer query operation, query_top_feaures does not support parameters such as outStatistics and its related parameters or return distinct values. Consult the advancedQueryCapabilities layer property for more details.

If the feature layer collection supports the query_top_feaures operation, it will include “supportsTopFeaturesQuery”: true, in the advancedQueryCapabilities layer property.

Argument

Description

top_filter

Required Dict. The top_filter define the aggregation of the data.

  • groupByFields define the field or fields used to aggregate

    your data.

  • topCount defines the number of features returned from the top

    features query and is a numeric value.

  • orderByFields defines the order in which the top features will

    be returned. orderByFields can be specified in either ascending (asc) or descending (desc) order, ascending being the default.

Example: {“groupByFields”: “worker”, “topCount”: 1,

“orderByFields”: “employeeNumber”}

where

Optional String. A WHERE clause for the query filter. SQL ‘92 WHERE clause syntax on the fields in the layer is supported for most data sources.

objectids

Optional List. The object IDs of the layer or table to be queried.

start_time

Optional Datetime. The starting time to query for.

end_time

Optional Datetime. The end date to query for.

geometry_filter

Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.

out_fields

Optional String. The list of fields to include in the return results.

return_geometry

Optional Boolean. If False, the query will not return geometries. The default is True.

return_centroid

Optional Boolean. If True, the centroid of the geometry will be added to the output.

max_allowable_offset

Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of out_sr. If out_sr is not specified, max_allowable_offset is assumed to be in the unit of the spatial reference of the layer.

out_sr

Optional Integer. The WKID for the spatial reference of the returned geometry.

geometry_precision

Optional Integer. This option can be used to specify the number of decimal places in the response geometries returned by the query operation. This applies to X and Y values only (not m or z-values).

return_ids_only

Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.

return_extent_only

Optional boolean. If true, the response only includes the extent of the features that would be returned by the query. If returnCountOnly=true, the response will return both the count and the extent. The default is false. This parameter applies only if the supportsReturningQueryExtent property of the layer is true.

order_by_field

Optional Str. Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER

return_z

Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is False.

return_m

Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.

result_type

Optional String. The result_type can be used to control the number of features returned by the query operation. Values: none | standard | tile

as_df

Optional Boolean. If False, the result is returned as a FeatureSet. If True (default) the result is returned as a spatially enabled dataframe.

Returns

Default - pd.DataFrame, when as_df=False returns a FeatureSet. If return_count_only is True, the return type is Integer. If the return_ids_only is True, a list of value is returned.

property renderer

Get/Set the Renderer of the Feature Layer. This overrides the default symbology when displaying it on a webmap.

Returns

InsensitiveDict

property time_filter

Starting at Enterprise 10.7.1+, instead of querying time-enabled map service layers or time-enabled feature service layers, a time filter can be specified. Time can be filtered as a single instant or by separating the two ends of a time extent with a comma.

Input

Description

value

Required Datetime/List Datetime. This is a single or list of start/stop date.

Returns

String of datetime values as milliseconds from epoch

update_metadata(file_path)

Updates a Layer’s metadata from an xml file.

Argument

Description

file_path

Required String. The path to the .xml file that contains the metadata.

Returns

boolean

validate_sql(sql, sql_type='where')

The validate_sql operation validates an SQL-92 expression or WHERE clause. The validate_sql operation ensures that an SQL-92 expression, such as one written by a user through a user interface, is correct before performing another operation that uses the expression. For example, validateSQL can be used to validate information that is subsequently passed in as part of the where parameter of the calculate operation. validate_sql also prevents SQL injection. In addition, all table and field names used in the SQL expression or WHERE clause are validated to ensure they are valid tables and fields.

Argument

Description

sql

Required String. The SQL expression of WHERE clause to validate. Example: “Population > 300000”

sql_type

Optional String. Three SQL types are supported in validate_sql
  • where (default) - Represents the custom WHERE clause the user can compose when querying a layer or using calculate.

  • expression - Represents an SQL-92 expression. Currently, expression is used as a default value expression when adding a new field or using the calculate API.

  • statement - Represents the full SQL-92 statement that can be passed directly to the database. No current ArcGIS REST API resource or operation supports using the full SQL-92 SELECT statement directly. It has been added to the validateSQL for completeness. Values: where | expression | statement

Returns

dict

arcgis.features.Table

class arcgis.features.Table(url, gis=None, container=None, dynamic_layer=None)

Tables represent entity classes with uniform properties. In addition to working with “entities with location” as features, the GIS can also work with non-spatial entities as rows in tables.

Working with tables is similar to working with feature layers, except that the rows (Features) in a table do not have a geometry, and tables ignore any geometry related operation.

append(item_id=None, upload_format='featureCollection', source_table_name=None, field_mappings=None, edits=None, source_info=None, upsert=True, skip_updates=False, use_globalids=False, update_geometry=True, append_fields=None, rollback=False, skip_inserts=None, upsert_matching_field=None)

Only available in ArcGIS Online

Update an existing hosted feature layer using append.

Argument

Description

source_table_name

optional string. Required only when the source data contains more than one tables, e.g., for file geodatabase. Example: source_table_name= “Building”

item_id

optional string. The ID for the Portal item that contains the source file. Used in conjunction with editsUploadFormat.

field_mappings

optional list. Used to map source data to a destination layer. Syntax: fieldMappings=[{“name” : <”targerName”>,

“sourceName” : < “sourceName”>}, …]

Examples: fieldMappings=[{“name”“CountyID”,

“sourceName” : “GEOID10”}]

edits

optional string. Only feature collection json is supported. Append supports all format through the upload_id or item_id.

source_info

optional dictionary. This is only needed when appending data from excel or csv. The appendSourceInfo can be the publishing parameter returned from analyze the csv or excel file.

upsert

optional boolean. Optional parameter specifying whether the edits needs to be applied as updates if the feature already exists. Default is false.

skip_updates

Optional boolean. Parameter is used only when upsert is true.

use_globalids

Optional boolean. Specifying whether upsert needs to use GlobalId when matching features.

update_geometry

Optional boolean. The parameter is used only when upsert is true. Skip updating the geometry and update only the attributes for existing features if they match source features by objectId or globalId.(as specified by useGlobalIds parameter).

append_fields

Optional list. The list of destination fields to append to. This is supported when upsert=true or false. Values: [“fieldName1”, “fieldName2”,….]

upload_format

required string. The source append data format. The default is featureCollection format. Values: sqlite | shapefile | filegdb | featureCollection | geojson | csv | excel

rollback

Optional boolean. Optional parameter specifying whether the upsert edits needs to be rolled back in case of failure. Default is false.

skip_inserts

Used only when upsert is true. Used to skip inserts if the value is true. The default value is false.

upsert_matching_field

Optional string. The layer field to be used when matching features with upsert. ObjectId, GlobalId, and any other field that has a unique index can be used with upsert. This parameter overrides use_globalids; e.g., specifying upsert_matching_field will be used even if you specify use_globalids = True. Example: upsert_matching_field=”MyfieldWithUniqueIndex”

Returns

boolean

calculate(where, calc_expression, sql_format='standard', version=None, sessionid=None, return_edit_moment=None, future=False)

The calculate operation is performed on a feature layer resource. It updates the values of one or more fields in an existing feature service layer based on SQL expressions or scalar values. The calculate operation can only be used if the supportsCalculate property of the layer is true. Neither the Shape field nor system fields can be updated using calculate. System fields include ObjectId and GlobalId. See Calculate a field for more information on supported expressions

Inputs

Description

where

Required String. A where clause can be used to limit the updated records. Any legal SQL where clause operating on the fields in the layer is allowed.

calc_expression

Required List. The array of field/value info objects that contain the field or fields to update and their scalar values or SQL expression. Allowed types are dictionary and list. List must be a list of dictionary objects.

Calculation Format is as follows:

{“field” : “<field name>”, “value” : “<value>”}

sql_format

Optional String. The SQL format for the calc_expression. It can be either standard SQL92 (standard) or native SQL (native). The default is standard.

Values: standard, native

version

Optional String. The geodatabase version to apply the edits.

sessionid

Optional String. A parameter which is set by a client during long transaction editing on a branch version. The sessionid is a GUID value that clients establish at the beginning and use throughout the edit session. The sessonid ensures isolation during the edit session. This parameter applies only if the isDataBranchVersioned property of the layer is true.

return_edit_moment

Optional Boolean. This parameter specifies whether the response will report the time edits were applied. If true, the server will return the time edits were applied in the response’s edit moment key. This parameter applies only if the isDataBranchVersioned property of the layer is true.

future

Optional Boolean. If True, the result is returned as a future object and the results are obtained in an asynchronous fashion. False is the default.

This applies to 10.8+ only

# Usage Example 1:

print(fl.calculate(where="OBJECTID < 2",
                   calc_expression={"field": "ZONE", "value" : "R1"}))
# Usage Example 2:

print(fl.calculate(where="OBJECTID < 2001",
                   calc_expression={"field": "A",  "sqlExpression" : "B*3"}))

Output: dictionary with format {‘updatedFeatureCount’: 1, ‘success’: True}

property container

The feature layer collection to which this layer belongs.

delete_features(deletes=None, where=None, geometry_filter=None, gdb_version=None, rollback_on_failure=True, return_delete_results=True, future=False)

This operation deletes features in a feature layer or table

Argument

Description

deletes

Optional string. A comma seperated string of OIDs to remove from the service.

where

Optional string. A where clause for the query filter. Any legal SQL where clause operating on the fields in the layer is allowed. Features conforming to the specified where clause will be deleted.

geometry_filter

Optional SpatialFilter. A spatial filter from arcgis.geometry.filters module to filter results by a spatial relationship with another geometry.

gdb_version

Optional string. A Geodatabase version to apply the edits.

rollback_on_failure

Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.

return_delete_results

Optional Boolean. Optional parameter that indicates whether a result is returned per deleted row when the deleteFeatures operation is run. The default is true.

future

Optional Boolean. If future=True, then the operation will occur asynchronously else the operation will occur synchronously. False is the default.

Returns

Dict if future=False (default), else a concurrent.Future class.

edit_features(adds=None, updates=None, deletes=None, gdb_version=None, use_global_ids=False, rollback_on_failure=True, return_edit_moment=False, attachments=None, true_curve_client=False, session_id=None, use_previous_moment=False, datum_transformation=None)

This operation adds, updates, and deletes features to the associated feature layer or table in a single call.

When making large number (250+ records at once) of edits, append should be used over edit_features to improve performance and ensure service stability.

Inputs

Description

adds

Optional FeatureSet/List. The array of features to be added.

updates

Optional FeatureSet/List. The array of features to be updated.

deletes

Optional FeatureSet/List. string of OIDs to remove from service

use_global_ids

Optional boolean. Instead of referencing the default Object ID field, the service will look at a GUID field to track changes. This means the GUIDs will be passed instead of OIDs for delete, update or add features.

gdb_version

Optional boolean. Geodatabase version to apply the edits.

rollback_on_failure

Optional boolean. Optional parameter to specify if the edits should be applied only if all submitted edits succeed. If false, the server will apply the edits that succeed even if some of the submitted edits fail. If true, the server will apply the edits only if all edits succeed. The default value is true.

return_edit_moment

Optional boolean. Introduced at 10.5, only applicable with ArcGIS Server services only. Specifies whether the response will report the time edits were applied. If set to true, the server will return the time in the response’s editMoment key. The default value is false.

attachments

Optional Dict. This parameter adds, updates, or deletes attachments. It applies only when the use_global_ids parameter is set to true. For adds, the globalIds of the attachments provided by the client are preserved. When useGlobalIds is true, updates and deletes are identified by each feature or attachment globalId, rather than their objectId or attachmentId. This parameter requires the layer’s supportsApplyEditsWithGlobalIds property to be true.

Attachments to be added or updated can use either pre-uploaded data or base 64 encoded data.

Inputs

Inputs

Description

adds

List of attachments to add.

updates

List of attachements to update

deletes

List of attachments to delete

Additional attachment information here.

true_curve_client

Optional boolean. Introduced at 10.5. Indicates to the server whether the client is true curve capable. When set to true, this indicates to the server that true curve geometries should be downloaded and that geometries containing true curves should be consumed by the map service without densifying it. When set to false, this indicates to the server that the client is not true curves capable. The default value is false.

session_id

Optional String. Introduced at 10.6. The session_id is a GUID value that clients establish at the beginning and use throughout the edit session. The sessonID ensures isolation during the edit session. The session_id parameter is set by a client during long transaction editing on a branch version.

use_previous_moment

Optional Boolean. Introduced at 10.6. The use_previous_moment parameter is used to apply the edits with the same edit moment as the previous set of edits. This allows an editor to apply single block of edits partially, complete another task and then complete the block of edits. This parameter is set by a client during long transaction editing on a branch version.

When set to true, the edits are applied with the same edit moment as the previous set of edits. When set to false or not set (default) the edits are applied with a new edit moment.

datum_transformation

Optional Integer/Dictionary. This parameter applies a datum transformation while projecting geometries in the results when out_sr is different than the layer’s spatial reference. When specifying transformations, you need to think about which datum transformation best projects the layer (not the feature service) to the outSR and sourceSpatialReference property in the layer properties. For a list of valid datum transformation ID values ad well-known text strings, see Coordinate systems and transformations. For more information on datum transformations, please see the transformation parameter in the Project operation.

Examples

Inputs

Description

WKID

Integer. Ex: datum_transformation=4326

WKT

Dict. Ex: datum_transformation={“wkt”: “<WKT>”}

Composite

Dict. Ex: datum_transformation=```{‘geoTransforms’:[{‘wkid’:<id>,’forward’:<true|false>},{‘wkt’:’<WKT>’,’forward’:<True|False>}]}```

Output: dictionary

export_attachments(output_folder, label_field=None)

Exports attachments from the feature layer in Imagenet format using the output_label_field.

Argument

Description

output_folder

Required. Output folder where the attachments will be stored.

label_field

Optional. Field which contains the label/category of each feature. If None, a default folder is created.

classmethod fromitem(item, table_id=0)

Creates a Table from a GIS Item. The type of item should be a ‘Feature Service’ that represents a FeatureLayerCollection. The layer_id is the id of the layer in feature layer collection (feature service).

generate_renderer(definition, where=None)

This operation groups data using the supplied definition (classification definition) and an optional where clause. The result is a renderer object. Use baseSymbol and colorRamp to define the symbols assigned to each class. If the operation is performed on a table, the result is a renderer object containing the data classes and no symbols.

Argument

Description

definition

required dict. The definition using the renderer that is generated. Use either class breaks or unique value classification definitions. See: https://resources.arcgis.com/en/help/rest/apiref/ms_classification.html

where

optional string. A where clause for which the data needs to be classified. Any legal SQL where clause operating on the fields in the dynamic layer/table is allowed.

Returns

dictionary

get_html_popup(oid)

The htmlPopup resource provides details about the HTML pop-up authored by the user using ArcGIS for Desktop.

Argument

Description

oid

Optional string. Object id of the feature to get the HTML popup.

Returns

string

get_unique_values(attribute, query_string='1=1')

Return a list of unique values for a given attribute

Argument

Description

attribute

Required string. The feature layer attribute to query.

query_string

Optional string. SQL Query that will be used to filter attributes before unique values are returned. ex. “name_2 like ‘%K%’”

property manager

Helper object to manage the feature layer, update it’s definition, etc

property metadata

The metadata property allows for the setting and downloading of the Feature Layer’s metadata. If metadata is disabled on the GIS or the layer does not support metdata, None value will be returned.

Argument

Description

value

Required String. (SET) Path to the metadata file.

Returns

String (GET)

property properties

The properties of this object

query(where='1=1', out_fields='*', time_filter=None, return_count_only=False, return_ids_only=False, return_distinct_values=False, group_by_fields_for_statistics=None, statistic_filter=None, result_offset=None, result_record_count=None, object_ids=None, gdb_version=None, order_by_fields=None, out_statistics=None, return_all_records=True, historic_moment=None, sql_format=None, return_exceeded_limit_features=None, as_df=False, having=None, **kwargs)

Queries a Table Layer based on a set of criteria.

Argument

Description

where

Optional string. The default is 1=1. The selection sql statement.

out_fields

Optional List of field names to return. Field names can be specified either as a List of field names or as a comma separated string. The default is “*”, which returns all the fields.

object_ids

Optional string. The object IDs of this layer or table to be queried. The object ID values should be a comma-separated string.

time_filter

Optional list. The format is of [<startTime>, <endTime>] using datetime.date, datetime.datetime or timestamp in milliseconds. Syntax: time_filter=[<startTime>, <endTime>] ; specified as

datetime.date, datetime.datetime or timestamp in milliseconds

gdb_version

Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer is true. If this is not specified, the query will apply to the published map’s version.

return_geometry

Optional boolean. If true, geometry is returned with the query. Default is true.

return_distinct_values

Optional boolean. If true, it returns distinct values based on the fields specified in out_fields. This parameter applies only if the supportsAdvancedQueries property of the layer is true.

return_ids_only

Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.

return_count_only

Optional boolean. If true, the response only includes the count (number of features/records) that would be returned by a query. Otherwise, the response is a feature set. The default is false. This option supersedes the returnIdsOnly parameter. If returnCountOnly = true, the response will return both the count and the extent.

order_by_fields

Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER

group_by_fields_for_statistics

Optional string. One or more field names on which the values need to be grouped for calculating the statistics. example: STATE_NAME, GENDER

out_statistics

Optional string. The definitions for one or more field-based statistics to be calculated.

Syntax:

[
{

“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field1”, “outStatisticFieldName”: “Out_Field_Name1”

}, {

“statisticType”: “<count | sum | min | max | avg | stddev | var>”, “onStatisticField”: “Field2”, “outStatisticFieldName”: “Out_Field_Name2”

}

]

result_offset

Optional integer. This option can be used for fetching query results by skipping the specified number of records and starting from the next record (that is, resultOffset + 1th). This option is ignored if return_all_records is True (i.e. by default).

result_record_count

Optional integer. This option can be used for fetching query results up to the result_record_count specified. When result_offset is specified but this parameter is not, the map service defaults it to max_record_count. The maximum value for this parameter is the value of the layer’s max_record_count property. This option is ignored if return_all_records is True (i.e. by default).

return_all_records

Optional boolean. When True, the query operation will call the service until all records that satisfy the where_clause are returned. Note: result_offset and result_record_count will be ignored if return_all_records is True. Also, if return_count_only, return_ids_only, or return_extent_only are True, this parameter will be ignored.

historic_moment

Optional integer. The historic moment to query. This parameter applies only if the layer is archiving enabled and the supportsQueryWithHistoricMoment property is set to true. This property is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

sql_format

Optional string. The sql_format parameter can be either standard SQL92 standard or it can use the native SQL of the underlying datastore native. The default is none which means the sql_format depends on useStandardizedQuery parameter. Values: none | standard | native

return_exceeded_limit_features

Optional boolean. Optional parameter which is true by default. When set to true, features are returned even when the results include ‘exceededTransferLimit’: True.

When set to false and querying with resultType = tile features are not returned when the results include ‘exceededTransferLimit’: True. This allows a client to find the resolution in which the transfer limit is no longer exceeded without making multiple calls.

as_df

Optional boolean. If True, the results are returned as a DataFrame instead of a FeatureSet.

kwargs

Optional dict. Optional parameters that can be passed to the Query function. This will allow users to pass additional parameters not explicitly implemented on the function. A complete list of functions available is documented on the Query REST API.

Returns

A FeatureSet or Panda’s DataFrame containing the features matching the query unless another return type is specified, such as count

query_analytics(out_analytics, where='1=1', out_fields='*', analytic_where=None, geometry_filter=None, out_sr=None, return_geometry=True, order_by=None, result_type=None, cache_hint=None, result_offset=None, result_record_count=None, quantization_param=None, sql_format=None, future=True, **kwargs)

query_analytics exposes the standard SQL windows functions that compute aggregate and ranking values based on a group of rows called window partition. The window function is applied to the rows after the partitioning and ordering of the rows. query_analytics defines a window or user-specified set of rows within a query result set. query_analytics can be used to compute aggregated values such as moving averages, cumulative aggregates, or running totals.

SQL Windows Function

A window function performs a calculation across a set of rows (SQL partition or window) that are related to the current row. Unlike regular aggregate functions, use of a window function does not return single output row. The rows retain their separate identities with each calculation appended to the rows as a new field value. The window function can access more than just the current row of the query result.

query_analytics currently supports the following windows functions:
  • Aggregate functions

  • Analytic functions

  • Ranking functions

Aggregate Functions

Aggregate functions are deterministic function that perform a calculation on a set of values and return a single value. They are used in the select list with optional HAVING clause. GROUP BY clause can also be used to calculate the aggregation on categories of rows. query_analytics can be used to calculate the aggregation on a specific range of value. Supported aggregate functions are:

  • Min

  • Max

  • Sum

  • Count

  • AVG

  • STDDEV

  • VAR

Analytic Functions

Several analytic functions available now in all SQL vendors to compute an aggregate value based on a group of rows or windows partition. Unlike aggregation functions, analytic functions can return single or multiple rows for each group.

  • CUM_DIST

  • FIRST_VALUE

  • LAST_VALUE

  • LEAD

  • LAG

  • PERCENTILE_DISC

  • PERCENTILE_CONT

  • PERCENT_RANK

Ranking Functions

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows.

  • RANK

  • NTILE

  • DENSE_RANK

  • ROW_NUMBER

Partitioning

Partitions are extremely useful when you need to calculate the same metric over different group of rows. It is very powerful and has many potential usages. For example, you can add partition by to your window specification to look at different groups of rows individually.

‘partitionBy’ clause divides the query result set into partitions and the sql window function is applied to each partition. The ‘partitionBy’ clause normally refers to the column by which the result is partitioned. ‘partitionBy’ can also be a value expression (column expression or function) that references any of the selected columns (not aliases).

Argument

Description

out_analytics

Required List. A set of analytics to calculate on the Feature Layer.

The definitions for one or more field-based or expression analytics to be computed. This parameter is supported only on layers/tables that indicate supportsAnalytics is true. Note: If outAnalyticFieldName is empty or missing, the server assigns a field name to the returned analytic field.

Syntax: An array of analytic definitions. An analytic definition specifies the type of analytic, the field or expression on which it is to be computed, and the resulting output field name. Syntax [

{

“analyticType”: “<COUNT | SUM | MIN | MAX | AVG | STDDEV | VAR | FIRST_VALUE, LAST_VALUE, LAG, LEAD, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, RANK, NTILE, DENSE_RANK, EXPRESSION>”, “onAnalyticField”: “Field1”, “outAnalyticFieldName”: “Out_Field_Name1”,

“analyticParameters”: {

“orderBy”: “<orderBy expression”, “value”: <double value>,// percentile value “partitionBy”: “<field name or expression>”, “offset”: <integer>, // used by LAG/LEAD “windowFrame”: {

“type”: “ROWS” | “RANGE”, “extent”: {

“extentType”: “PRECEDING” | “BOUNDARY”, “PRECEDING”: {

“type”: <”UNBOUNDED” |
“NUMERIC_CONSTANT” |

“CURRENT_ROW”>

“value”: <numeric constant value>

} “BOUNDARY”: {

“start”: “UNBOUNDED_PRECEDING”,
“NUMERIC_PRECEDING”,

“CURRENT_ROW”,

“startValue”: <numeric constant value>, “end”: <”UNBOUNDED_FOLLOWING” |

“NUMERIC_FOLLOWING” | “CURRENT_ROW”,

“endValue”: <numeric constant value>

}

}

}

}

}

}

]

Example: [{

“analyticType”: “FIRST_VALUE”, “onAnalyticField”: “POP1990”, “analyticParameters”: {

“orderBy”: “POP1990”, “partitionBy”: “state_name”

}, “outAnalyticFieldName”: “FirstValue”

}

]

where

Optional string. The default is 1=1. The selection sql statement.

out_fields

Optional List of field names to return. Field names can be specified either as a List of field names or as a comma separated string. The default is “*”, which returns all the fields.

analytic_where

Optional String. A where clause for the query filter that applies to the result set of applying the source where clause and all other params.

geometry_filter

Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.

out_sr

Optional Integer. The WKID for the spatial reference of the returned geometry.

out_sr

Optional Integer. The output spatial reference wkid.

return_geometry

Optional boolean. If true, geometry is returned with the query. Default is true.

order_by

Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER

result_type

Optional string. The result_type parameter can be used to control the number of features returned by the query operation. Values: None | standard | tile

cache_hint

Optional Boolean. If you are performing the same query multiple times, a user can ask the server to cache the call to obtain the results quicker. The default is False.

result_offset

Optional integer. This option can be used for fetching query results by skipping the specified number of records and starting from the next record (that is, resultOffset + 1th).

result_record_count

Optional integer. This option can be used for fetching query results up to the result_record_count specified. When result_offset is specified but this parameter is not, the map service defaults it to max_record_count. The maximum value for this parameter is the value of the layer’s max_record_count property.

quantization_parameters

Optional dict. Used to project the geometry onto a virtual grid, likely representing pixels on the screen.

sql_format

Optional string. The sql_format parameter can be either standard SQL92 standard or it can use the native SQL of the underlying datastore native. The default is none which means the sql_format depends on useStandardizedQuery parameter. Values: none | standard | native

future

Optional Boolean. This determines if a Future object is returned (True) the method returns the results directly (False).

Returns

pd.DataFrame

The Query operation is performed on a feature service layer resource. The result of this operation are feature sets grouped by source layer/table object IDs. Each feature set contains Feature objects including the values for the fields requested by the user. For related layers, if you request geometry information, the geometry of each feature is also returned in the feature set. For related tables, the feature set does not include geometries.

Argument

Description

object_ids

Required string. The object IDs of the table/layer to be queried

relationship_id

Required string. The ID of the relationship to be queried.

out_fields

Required string. the list of fields from the related table/layer to be included in the returned feature set. This list is a comma delimited list of field names. If you specify the shape field in the list of return fields, it is ignored. To request geometry, set return_geometry to true. You can also specify the wildcard “*” as the value of this parameter. In this case, the results will include all the field values.

definition_expression

Optional string. The definition expression to be applied to the related table/layer. From the list of objectIds, only those records that conform to this expression are queried for related records.

return_geometry

Optional boolean. If true, the feature set includes the geometry associated with each feature. The default is true.

max_allowable_offset

Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of the outSR. If out_wkid is not specified, then max_allowable_offset is assumed to be in the unit of the spatial reference of the map.

geometry_precision

Optional integer. This option can be used to specify the number of decimal places in the response geometries.

out_wkid

Optional Integer. The spatial reference of the returned geometry.

gdb_version

Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer queried is true.

return_z

Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is false.

return_m

Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.

historic_moment

Optional Integer/datetime. The historic moment to query. This parameter applies only if the supportsQueryWithHistoricMoment property of the layers being queried is set to true. This setting is provided in the layer resource.

If historic_moment is not specified, the query will apply to the current features.

Syntax: historic_moment=<Epoch time in milliseconds>

return_true_curves

Optional boolean. Optional parameter that is false by default. When set to true, returns true curves in output geometries; otherwise, curves are converted to densified polylines or polygons.

Returns

dict

query_top_features(top_filter=None, where=None, objectids=None, start_time=None, end_time=None, geometry_filter=None, out_fields='*', return_geometry=True, return_centroid=False, max_allowable_offset=None, out_sr=None, geometry_precision=None, return_ids_only=False, return_extents_only=False, order_by_field=None, return_z=False, return_m=False, result_type=None, as_df=True)

The query_top_features is performed on a feature layer. This operation returns a feature set or spatially enabled dataframe based on the top features by order within a group. For example, when querying counties in the United States, you want to return the top five counties by population in each state. To do this, you can use query_top_feaures to group by state name, order by desc on the population and return the first five rows from each group (state).

The top_filter parameter is used to set the group by, order by, and count criteria used in generating the result. The operation also has many of the same parameters (for example, where and geometry) as the layer query operation. However, unlike the layer query operation, query_top_feaures does not support parameters such as outStatistics and its related parameters or return distinct values. Consult the advancedQueryCapabilities layer property for more details.

If the feature layer collection supports the query_top_feaures operation, it will include “supportsTopFeaturesQuery”: true, in the advancedQueryCapabilities layer property.

Argument

Description

top_filter

Required Dict. The top_filter define the aggregation of the data.

  • groupByFields define the field or fields used to aggregate

    your data.

  • topCount defines the number of features returned from the top

    features query and is a numeric value.

  • orderByFields defines the order in which the top features will

    be returned. orderByFields can be specified in either ascending (asc) or descending (desc) order, ascending being the default.

Example: {“groupByFields”: “worker”, “topCount”: 1,

“orderByFields”: “employeeNumber”}

where

Optional String. A WHERE clause for the query filter. SQL ‘92 WHERE clause syntax on the fields in the layer is supported for most data sources.

objectids

Optional List. The object IDs of the layer or table to be queried.

start_time

Optional Datetime. The starting time to query for.

end_time

Optional Datetime. The end date to query for.

geometry_filter

Optional from arcgis.geometry.filter. Allows for the information to be filtered on spatial relationship with another geometry.

out_fields

Optional String. The list of fields to include in the return results.

return_geometry

Optional Boolean. If False, the query will not return geometries. The default is True.

return_centroid

Optional Boolean. If True, the centroid of the geometry will be added to the output.

max_allowable_offset

Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of out_sr. If out_sr is not specified, max_allowable_offset is assumed to be in the unit of the spatial reference of the layer.

out_sr

Optional Integer. The WKID for the spatial reference of the returned geometry.

geometry_precision

Optional Integer. This option can be used to specify the number of decimal places in the response geometries returned by the query operation. This applies to X and Y values only (not m or z-values).

return_ids_only

Optional boolean. Default is False. If true, the response only includes an array of object IDs. Otherwise, the response is a feature set.

return_extent_only

Optional boolean. If true, the response only includes the extent of the features that would be returned by the query. If returnCountOnly=true, the response will return both the count and the extent. The default is false. This parameter applies only if the supportsReturningQueryExtent property of the layer is true.

order_by_field

Optional Str. Optional string. One or more field names on which the features/records need to be ordered. Use ASC or DESC for ascending or descending, respectively, following every field to control the ordering. example: STATE_NAME ASC, RACE DESC, GENDER

return_z

Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is False.

return_m

Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.

result_type

Optional String. The result_type can be used to control the number of features returned by the query operation. Values: none | standard | tile

as_df

Optional Boolean. If False, the result is returned as a FeatureSet. If True (default) the result is returned as a spatially enabled dataframe.

Returns

Default - pd.DataFrame, when as_df=False returns a FeatureSet. If return_count_only is True, the return type is Integer. If the return_ids_only is True, a list of value is returned.

property renderer

Get/Set the Renderer of the Feature Layer. This overrides the default symbology when displaying it on a webmap.

Returns

InsensitiveDict

property time_filter

Starting at Enterprise 10.7.1+, instead of querying time-enabled map service layers or time-enabled feature service layers, a time filter can be specified. Time can be filtered as a single instant or by separating the two ends of a time extent with a comma.

Input

Description

value

Required Datetime/List Datetime. This is a single or list of start/stop date.

Returns

String of datetime values as milliseconds from epoch

update_metadata(file_path)

Updates a Layer’s metadata from an xml file.

Argument

Description

file_path

Required String. The path to the .xml file that contains the metadata.

Returns

boolean

validate_sql(sql, sql_type='where')

The validate_sql operation validates an SQL-92 expression or WHERE clause. The validate_sql operation ensures that an SQL-92 expression, such as one written by a user through a user interface, is correct before performing another operation that uses the expression. For example, validateSQL can be used to validate information that is subsequently passed in as part of the where parameter of the calculate operation. validate_sql also prevents SQL injection. In addition, all table and field names used in the SQL expression or WHERE clause are validated to ensure they are valid tables and fields.

Argument

Description

sql

Required String. The SQL expression of WHERE clause to validate. Example: “Population > 300000”

sql_type

Optional String. Three SQL types are supported in validate_sql
  • where (default) - Represents the custom WHERE clause the user can compose when querying a layer or using calculate.

  • expression - Represents an SQL-92 expression. Currently, expression is used as a default value expression when adding a new field or using the calculate API.

  • statement - Represents the full SQL-92 statement that can be passed directly to the database. No current ArcGIS REST API resource or operation supports using the full SQL-92 SELECT statement directly. It has been added to the validateSQL for completeness. Values: where | expression | statement

Returns

dict

arcgis.features.FeatureLayerCollection

class arcgis.features.FeatureLayerCollection(url, gis=None)

A FeatureLayerCollection is a collection of feature layers and tables, with the associated relationships among the entities.

In a web GIS, a feature layer collection is exposed as a feature service with multiple feature layers.

Instances of FeatureDatasets can be obtained from feature service Items in the GIS using FeatureLayerCollection.fromitem(item), from feature service endpoints using the constructor, or by accessing the dataset attribute of feature layer objects.

FeatureDatasets can be configured and managed using their manager helper object.

If the dataset supports the sync operation, the replicas helper object allows management and synchronization of replicas for disconnected editing of the feature layer collection.

Note: You can use the layers and tables property to get to the individual layers and tables in this feature layer collection.

extract_changes(layers, servergen, queries=None, geometry=None, geometry_type=None, in_sr=None, version=None, return_inserts=False, return_updates=False, return_deletes=False, return_ids_only=False, return_extent_only=False, return_attachments=False, attachments_by_url=False, data_format='json', change_extent_grid_cell=None)

Feature service change tracking is an efficient change tracking mechanism for applications. Applications can use change tracking to query changes that have been made to the layers and tables in the service. For enterprise geodatabase based feature services published from ArcGIS Pro 2.2 or higher, the ChangeTracking capability requires all layers and tables to be either archive enabled or branch versioned and have globalid columns. Change tracking can also be enabled for ArcGIS Online hosted feature services. If all layers and tables in the service have the ChangeTracking capability, the extract_changes operation can be used to get changes.

Argument

Description

layers

Required List. The list of layers (by index value) and tables to include in the output.

servergen

Required List. The servergen numbers allow a client to specify the last layer generation numbers (a Unix epoch time value in milliseconds) for the changes received from the server. All changes made after this value will be returned.

  • minServerGen: It is the min generation of the server data changes. Clients with layerServerGens that is less than minServerGen cannot extract changes and would need to make a full server/layers query instead of extracting changes.

  • serverGen: It is the current server generation number of the changes. Every changed feature has a version or a generation number that is changed every time the feature is updated.

Syntax:

servergen= [{“id”: <layerId1>, “serverGen”: <genNum1>}, {“id”: <layerId2>, “serverGen”: <genNum2>}]

The id value for the layer is the index of the layer from the layers attribute on the FeatureLayerCollection. The serverGen value is a Unix epoch timestamp value in milliseconds.

# Usage Example:

servergen= [{"id": 0, "serverGen": 10500},
            {"id": 1, "serverGen": 1100},
            {"id": 2, "serverGen": 1200}]

queries

Optional Dictionary. In addition to the layers and geometry parameters, the queries parameter can be used to further define what changes to return. This parameter allows you to set query properties on a per-layer or per-table basis. If a layer’s ID is present in the layers parameter and missing from layer queries, it’s changed features that intersect with the filter geometry are returned.

The properties include the following:

  • where - Defines an attribute query for a layer or table. The default is no where clause.

  • useGeometry - Determines whether or not to apply the geometry for the layer. The default is true. If set to false, features from the layer that intersect the geometry are not added.

  • includeRelated - Determines whether or not to add related rows. The default is true. The value true is honored only for queryOption=none. This is only applicable if your data has relationship classes. Relationships are only processed in a forward direction from origin to destination.

  • queryOption - Defines whether or how filters will be applied to a layer. The queryOption was added in 10.2. See the Compatibility notes topic for more information. Valid values are None, useFilter, or all. See also the layerQueries column in the Request Parameters table in the Extract Changes (Feature Service) help for details and code samples.

  • When the value is none, no feature are returned based on where and filter geometry.

  • If includeRelated is false, no features are returned.

  • If includeRelated is true, features in this layer (that are related to the features in other layers in the replica) are returned.

  • When the value is useFilter, features that satisfy filtering based on geometry and where are returned. The value of includeRelated is ignored.

# Usage Example:

queries={Layer_or_tableID1:{"where":"attribute query",
                            "useGeometry": true | false,
                            "includeRelated": true | false},
         Layer_or_tableID2: {.}}

geometry

Optional Geometry/Extent. The geometry to apply as the spatial filter for the changes. All the changed features in layers intersecting this geometry will be returned. The structure of the geometry is the same as the structure of the JSON geometry objects returned by the ArcGIS REST API. In addition to the JSON structures, for envelopes and points you can specify the geometry with a simpler comma-separated syntax.

geometry_type

Optional String. The type of geometry specified by the geometry parameter. The geometry type can be an envelope, point, line or polygon. The default geometry type is an envelope.

Values: esriGeometryPoint, esriGeometryMultipoint, esriGeometryPolyline, esriGeometryPolygon, esriGeometryEnvelope

in_sr

Optional Integer. The spatial reference of the input geometry.

out_sr

Optional Integer/String. The output spatial reference of the returned changes.

version

Optional String. If branch versioning is enabled, a user can specify the branch version name to extract changes from.

return_inserts

Optional Boolean. If true, newly inserted features will be returned. The default is false.

return_updates

Optional Boolean. If true, updated features will be returned. The default is false.

return_deletes

Optional Boolean. If true, deleted features will be returned. The default is false.

return_ids_only

Optional Boolean. If true, the response includes an array of object IDs only. The default is false.

return_attachments

Optional Boolean. If true, attachments changes are returned in the response. Otherwise, attachments are not included. The default is false. This parameter is only applicable if the feature service has attachments.

attachments_by_url

Optional Boolean. If true, a reference to a URL will be provided for each attachment returned. Otherwise, attachments are embedded in the response. The default is true.

data_format

Optional String. The format of the changes returned in the response. The default is json. Values: sqllite or json

change_extent_grid_cell

Optional String. To optimize localizing changes extent, the value medium is an 8x8 grid that bound the changes extent. Used only when return_extent_only is true. The default is none. Values: None, large, medium, or small

Returns

dictionary containing the layerServerGens and an array of edits

#Usage Example for extracting all changes to a feaature layer in a particular version since the time the Feature Layer was created.

from arcgis.gis import GIS
from arcgis.features import FeatureLayerCollection

>>> gis = GIS(<url>, <username>, <password>)

# Search for the Feature Service item
>>> fl_item = gis.content.search('title:"my_feature_layer" type:"Feature Layer"')[0]
>>> created_time = fl_item.created

# Get the Feature Service url
>>> fs=gis.content.search('title:"my_feature_layer" type:"Feature"')[0].url

# Instantiate the a FeatureLayerCollection from the url
>>> flc=FeatureLayerCollection(fs, gis)

# Extract the changes for the version
>>> extracted_changes=flc.extract_changes(layers=[0],
                           servergen=[{"id": 0, "serverGen": created_time}],
                           version="<version_owner>.<version_name>",
                           return_ids_only=True,
                           return_inserts=True,
                           return_updates=True,
                           return_deletes=True,
                           data_format="json")

>>> extracted_changes

{'layerServerGens': [{'id': 0, 'serverGen': 1600713614620}],
 'edits': [{'id': 0,
   'objectIds': {'adds': [], 'updates': [194], 'deletes': []}}]}
classmethod fromitem(item)
property manager

helper object to manage the feature layer collection, update it’s definition, etc

property properties

The properties of this object

query(layer_defs_filter=None, geometry_filter=None, time_filter=None, return_geometry=True, return_ids_only=False, return_count_only=False, return_z=False, return_m=False, out_sr=None)

queries the feature layer collection

query_domains(layers)

The query_domains returns full domain information for the domains referenced by the layers in the feature layer collection. This operation is performed on a feature layer collection. The operation takes an array of layer IDs and returns the set of domains referenced by the layers.

Argument

Description

layers

Required List. An array of layers. The set of domains to return is based on the domains referenced by these layers. Example: [1,2,3,4]

Returns

list of dictionaries

The Query operation is performed on a feature service layer resource. The result of this operation are feature sets grouped by source layer/table object IDs. Each feature set contains Feature objects including the values for the fields requested by the user. For related layers, if you request geometry information, the geometry of each feature is also returned in the feature set. For related tables, the feature set does not include geometries.

Argument

Description

object_ids

Optional string. the object IDs of the table/layer to be queried.

relationship_id

Optional string. The ID of the relationship to be queried.

out_fields

Optional string.the list of fields from the related table/layer to be included in the returned feature set. This list is a comma delimited list of field names. If you specify the shape field in the list of return fields, it is ignored. To request geometry, set return_geometry to true. You can also specify the wildcard “*” as the value of this parameter. In this case, the results will include all the field values.

definition_expression

Optional string. The definition expression to be applied to the related table/layer. From the list of objectIds, only those records that conform to this expression are queried for related records.

return_geometry

Optional boolean. If true, the feature set includes the geometry associated with each feature. The default is true.

max_allowable_offset

Optional float. This option can be used to specify the max_allowable_offset to be used for generalizing geometries returned by the query operation. The max_allowable_offset is in the units of the outSR. If outSR is not specified, then max_allowable_offset is assumed to be in the unit of the spatial reference of the map.

geometry_precision

Optional integer. This option can be used to specify the number of decimal places in the response geometries.

out_wkid

Optional integer. The spatial reference of the returned geometry.

gdb_version

Optional string. The geodatabase version to query. This parameter applies only if the isDataVersioned property of the layer queried is true.

return_z

Optional boolean. If true, Z values are included in the results if the features have Z values. Otherwise, Z values are not returned. The default is false.

return_m

Optional boolean. If true, M values are included in the results if the features have M values. Otherwise, M values are not returned. The default is false.

Returns

dict

property relationships

The relationships property provides relationship information for the layers and tables in the feature layer collection.

The relationships resource includes information about relationship rules from the back-end relationship classes, in addition to the relationship information already found in the individual layers and tables.

Feature layer collections that support the relationships resource will have the “supportsRelationshipsResource”: true property on their properties.

Returns

List of Dictionaries

upload(path, description=None)

Uploads a new item to the server. Once the operation is completed successfully, the JSON structure of the uploaded item is returned.

Argument

Description

path

Optional string. Filepath of the file to upload.

description

Optional string. Descriptive text for the uploaded item.

Returns

boolean

property versions

Returns a VersionManager to create, update and use versions on a FeatureLayerCollection. If versioning is not enabled on the service, None is returned.

arcgis.features.FeatureSet

class arcgis.features.FeatureSet(features, fields=None, has_z=False, has_m=False, geometry_type=None, spatial_reference=None, display_field_name=None, object_id_field_name=None, global_id_field_name=None)

A set of features with information about their fields, field aliases, geometry type, spatial reference etc.

FeatureSets are commonly used as input/output with several Geoprocessing Tools, and can be the obtained through the query() methods of feature layers. A FeatureSet can be combined with a layer definition to compose a FeatureCollection.

FeatureSet contains Feature objects, including the values for the fields requested by the user. For layers, if you request geometry information, the geometry of each feature is also returned in the FeatureSet. For tables, the FeatureSet does not include geometries.

If a Spatial Reference is not specified at the FeatureSet level, the FeatureSet will assume the SpatialReference of its first feature. If the SpatialReference of the first feature is also not specified, the spatial reference will be UnknownCoordinateSystem.

property df

deprecated in v1.5.0 please use `sdf`

converts the FeatureSet to a Pandas dataframe. Requires pandas

property display_field_name

gets/sets the displayFieldName

property features

gets the features in the FeatureSet

property fields

gets the fields in the FeatureSet

static from_dataframe(df)

returns a featureset from a Pandas’ Data or Spatial DataFrame

static from_dict(featureset_dict)

returns a featureset from a dict

static from_geojson(geojson)

Converts a GeoJSON Feature Collection into a FeatureSet

static from_json(json_str)

returns a featureset from a JSON string

property geometry_type

gets/sets the geometry Type

property global_id_field_name

gets/sets the globalIdFieldName

property has_m

gets/set the M-property

property has_z

gets/sets the Z-property

property object_id_field_name

gets/sets the object id field

save(save_location, out_name, encoding=None)

Saves a featureset object to a feature class

Argument

Description

save_location

Required string. Path to export the FeatureSet to.

out_name

Required string. Name of the saved table.

encoding

Optional string. character encoding is used to represent a repertoire of characters by some kind of encoding system. The default is None.

Returns

string

property sdf

Converts the FeatureSet to a Spatially Enabled Pandas dataframe

property spatial_reference

gets the featureset’s spatial reference

to_dict()

converts the object to Python dictionary

property to_geojson

converts the object to GeoJSON

property to_json

converts the object to JSON

property value

returns object as dictionary

arcgis.features.FeatureCollection

class arcgis.features.FeatureCollection(dictdata)

FeatureCollection is an object with a layer definition and a feature set.

It is an in-memory collection of features with rendering information.

Feature Collections can be stored as Items in the GIS, added as layers to a map or scene, passed as inputs to feature analysis tools, and returned as results from feature analysis tools if an output name for a feature layer is not specified when calling the tool.

static from_featureset(fset, symbol=None, name=None)

Create a FeatureCollection object from a FeatureSet object.

Returns

A FeatureCollection object.

query()

Returns the data in this feature collection as a FeatureSet. Filtering by where clause is not supported for feature collections

arcgis.features.GeoAccessor

class arcgis.features.GeoAccessor(obj)

The DataFrame Accessor is a namespace that performs dataset operations. This includes visualization, spatial indexing, IO and dataset level properties.

property area

Returns the total area of the dataframe

Returns

float

>>> df.spatial.area
143.23427
property bbox

Returns the total length of the dataframe

Returns

Polygon

>>> df.spatial.bbox
{'rings' : [[[1,2], [2,3], [3,3],....]], 'spatialReference' {'wkid': 4326}}
property centroid

Returns the centroid of the dataframe

Returns

Geometry

>>> df.spatial.centroid
(-14.23427, 39)
distance_matrix(leaf_size=16, rebuild=False)

Creates a k-d tree to calculate the nearest-neighbor problem.

requires scipy

Argument

Description

leafsize

Optional Integer. The number of points at which the algorithm switches over to brute-force. Default: 16.

rebuild

Optional Boolean. If True, the current KDTree is erased. If false, any KD-Tree that exists will be returned.

Returns

scipy’s KDTree class

static from_df(df, address_column='address', geocoder=None, sr=None, geometry_column=None)

Returns a SpatialDataFrame from a dataframe with an address column.

Argument

Description

df

Required Pandas DataFrame. Source dataset

address_column

Optional String. The default is “address”. This is the name of a column in the specified dataframe that contains addresses (as strings). The addresses are batch geocoded using the GIS’s first configured geocoder and their locations used as the geometry of the spatial dataframe. Ignored if the ‘geometry_column’ is specified.

geocoder

Optional Geocoder. The geocoder to be used. If not specified, the active GIS’s first geocoder is used.

sr

Optional integer. The WKID of the spatial reference.

geometry_column

Optional String. The name of the geometry column to convert to the arcgis.Geometry Objects (new at version 1.8.1)

Returns

DataFrame

NOTE: Credits will be consumed for batch_geocoding, from the GIS to which the geocoder belongs.

static from_feather(path, spatial_column='SHAPE', columns=None, use_threads=True)

Load a feather-format object from the file path.

Argument

Description

path

String. Path object or file-like object. Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be:

file://localhost/path/to/table.feather.

If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO.

spatial_column

Optional String. The default is SHAPE. Specifies the column containing the geo-spatial information.

columns

Sequence/List/Array. The default is None. If not provided, all columns are read.

use_threads

Boolean. The default is True. Whether to parallelize reading using multiple threads.

Returns

pd.DataFrame

static from_featureclass(location, **kwargs)

Returns a Spatially enabled pandas.DataFrame from a feature class.

Argument

Description

location

Required string or pathlib.Path. Full path to the feature class

Optional parameters when ArcPy library is available in the current environment:

Optional Argument

Description

sql_clause

sql clause to parse data down. To learn more see ArcPy Search Cursor

where_clause

where statement. To learn more see ArcPy SQL reference

fields

list of strings specifying the field names.

spatial_filter

A Geometry object that will filter the results. This requires arcpy to work.

Returns

pandas.core.frame.DataFrame

static from_geodataframe(geo_df, inplace=False, column_name='SHAPE')

Import Geopandas GeoDataFrame into an ArcGIS Spatially enabled DataFrame. Requires geopandas library be installed in current environment.

Argument

Description

geo_df

GeoDataFrame object, created using GeoPandas library

inplace

Optional Bool. When True, the existing GeoDataFrame is spatially

enabled and returned. When False, a new Spatially Enabled DataFrame object is returned. Default is False.

column_name

Optional String. Sets the name of the geometry column. Default

is SHAPE.

Returns

ArcGIS Spatially Enabled DataFrame object.

static from_layer(layer)

Imports a FeatureLayer to a Spatially Enabled DataFrame

This operation converts a FeatureLayer or TableLayer to a Pandas’ DataFrame

Argument

Description

layer

Required FeatureLayer or TableLayer. The service to convert to a Spatially enabled DataFrame.

Usage:

>>> from arcgis.features import FeatureLayer
>>> mylayer = FeatureLayer(("https://sampleserver6.arcgisonline.com/arcgis/rest"
                    "/services/CommercialDamageAssessment/FeatureServer/0"))
>>> df = from_layer(mylayer)
>>> print(df.head())
Returns

Pandas’ DataFrame

static from_table(filename, **kwargs)

Allows a user to read from a non-spatial table

Note: ArcPy is Required for this method

Argument

Description

filename

Required string or pathlib.Path. The path to the table.

Keyword Arguments

Argument

Description

fields

Optional List/Tuple. A list (or tuple) of field names. For a single field, you can use a string instead of a list of strings.

Use an asterisk (*) instead of a list of fields if you want to access all fields from the input table (raster and BLOB fields are excluded). However, for faster performance and reliable field order, it is recommended that the list of fields be narrowed to only those that are actually needed.

Geometry, raster, and BLOB fields are not supported.

where

Optional String. An optional expression that limits the records returned.

skip_nulls

Optional Boolean. This controls whether records using nulls are skipped.

null_value

Optional String/Integer/Float. Replaces null values from the input with a new value.

Returns

pd.DataFrame

static from_xy(df, x_column, y_column, sr=4326)

Converts a Pandas DataFrame into a Spatial DataFrame by providing the X/Y columns.

Argument

Description

df

Required Pandas DataFrame. Source dataset

x_column

Required string. The name of the X-coordinate series

y_column

Required string. The name of the Y-coordinate series

sr

Optional int. The wkid number of the spatial reference. 4326 is the default value.

Returns

DataFrame

property full_extent

Returns the extent of the dataframe

Returns

tuple

>>> df.spatial.full_extent
(-118, 32, -97, 33)
property geometry_type

Returns a list Geometry Types for the DataFrame

property has_m

Returns a boolean that determines if the datasets have Z values

Returns

Boolean

property has_z

Returns a boolean that determines if the datasets have Z values

Returns

Boolean

join(right_df, how='inner', op='intersects', left_tag='left', right_tag='right')

Joins the current DataFrame to another spatially enabled dataframes based on spatial location based.

Note

requires the SEDF to be in the same coordinate system

Argument

Description

right_df

Required pd.DataFrame. Spatially enabled dataframe to join.

how

Required string. The type of join:

  • left - use keys from current dataframe and retains only current geometry column

  • right - use keys from right_df; retain only right_df geometry column

  • inner - use intersection of keys from both dfs and retain only current geometry column

op

Required string. The operation to use to perform the join. The default is intersects.

supported perations: intersects, within, and contains

left_tag

Optional String. If the same column is in the left and right dataframe, this will append that string value to the field.

right_tag

Optional String. If the same column is in the left and right dataframe, this will append that string value to the field.

Returns

Spatially enabled Pandas’ DataFrame

property length

Returns the total length of the dataframe

Returns

float

>>> df.spatial.length
1.23427
property name

returns the name of the geometry column

overlay(sdf, op='union')

Performs spatial operation operations on two spatially enabled dataframes.

requires ArcPy or Shapely

Argument

Description

sdf

Required Spatially Enabled DataFrame. The geometry to perform the operation from.

op

Optional String. The spatial operation to perform. The allowed value are: union, erase, identity, intersection. union is the default operation.

Returns

Spatially enabled DataFrame (pd.DataFrame)

plot(map_widget=None, **kwargs)

Plot draws the data on a web map. The user can describe in simple terms how to renderer spatial data using symbol. To make the process simplier a pallette for which colors are drawn from can be used instead of explicit colors.

Explicit Argument

Description

map_widget

optional WebMap object. This is the map to display the data on.

palette

optional string/dict. Color mapping. For simple renderer, just provide a string. For more robust renderers like unique renderer, a dictionary can be given.

renderer_type

optional string. Determines the type of renderer to use for the provided dataset. The default is ‘s’ which is for simple renderers.

Allowed values:

  • ‘s’ - is a simple renderer that uses one symbol only.

  • ‘u’ - unique renderer symbolizes features based on one

    or more matching string attributes.

  • ‘c’ - A class breaks renderer symbolizes based on the

    value of some numeric attribute.

  • ‘h’ - heatmap renders point data into a raster

    visualization that emphasizes areas of higher density or weighted values.

symbol_type

optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.

symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Allowed symbol types based on geometries:

Point Symbols

  • ‘o’ - Circle (default)

  • ‘+’ - Cross

  • ‘D’ - Diamond

  • ‘s’ - Square

  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)

  • ‘’ - Backward Diagonal

  • ‘/’ - Forward Diagonal

  • ‘|’ - Vertical Bar

  • ‘-‘ - Horizontal Bar

  • ‘x’ - Diagonal Cross

  • ‘+’ - Cross

col

optional string/list. Field or fields used for heatmap, class breaks, or unique renderers.

pallette

optional string. The color map to draw from in order to visualize the data. The default pallette is ‘jet’. To get a visual representation of the allowed color maps, use the display_colormaps method.

alpha

optional float. This is a value between 0 and 1 with 1 being the default value. The alpha sets the transparancy of the renderer when applicable.

** Render Syntax **

The render syntax allows for users to fully customize symbolizing the data.

** Simple Renderer**

A simple renderer is a renderer that uses one symbol only.

Optional Argument

Description

symbol_type

optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.

symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Point Symbols

  • ‘o’ - Circle (default)

  • ‘+’ - Cross

  • ‘D’ - Diamond

  • ‘s’ - Square

  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)

  • ‘’ - Backward Diagonal

  • ‘/’ - Forward Diagonal

  • ‘|’ - Vertical Bar

  • ‘-‘ - Horizontal Bar

  • ‘x’ - Diagonal Cross

  • ‘+’ - Cross

description

Description of the renderer.

rotation_expression

A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets.

rotation_type

String value which controls the origin and direction of rotation on point features. If the rotationType is defined as arithmetic, the symbol is rotated from East in a counter-clockwise direction where East is the 0 degree axis. If the rotationType is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis.

Must be one of the following values:

  • arithmetic

  • geographic

visual_variables

An array of objects used to set rendering properties.

Heatmap Renderer

The HeatmapRenderer renders point data into a raster visualization that emphasizes areas of higher density or weighted values.

Optional Argument

Description

blur_radius

The radius (in pixels) of the circle over which the majority of each point’s value is spread.

field

This is optional as this renderer can be created if no field is specified. Each feature gets the same value/importance/weight or with a field where each feature is weighted by the field’s value.

max_intensity

The pixel intensity value which is assigned the final color in the color ramp.

min_intensity

The pixel intensity value which is assigned the initial color in the color ramp.

ratio

A number between 0-1. Describes what portion along the gradient the colorStop is added.

Unique Renderer

This renderer symbolizes features based on one or more matching string attributes.

Optional Argument

Description

background_fill_symbol

A symbol used for polygon features as a background if the renderer uses point symbols, e.g. for bivariate types & size rendering. Only applicable to polygon layers. PictureFillSymbols can also be used outside of the Map Viewer for Size and Predominance and Size renderers.

default_label

Default label for the default symbol used to draw unspecified values.

default_symbol

Symbol used when a value cannot be matched.

field1, field2, field3

Attribute field renderer uses to match values.

field_delimiter

String inserted between the values if multiple attribute fields are specified.

rotation_expression

A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets. Rotation is set using a visual variable of type rotation info with a specified field or value expression property.

rotation_type

String property which controls the origin and direction of rotation. If the rotation type is defined as arithmetic the symbol is rotated from East in a counter-clockwise direction where East is the 0 degree axis. If the rotation type is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis. Must be one of the following values:

  • arithmetic

  • geographic

arcade_expression

An Arcade expression evaluating to either a string or a number.

arcade_title

The title identifying and describing the associated Arcade expression as defined in the valueExpression property.

visual_variables

An array of objects used to set rendering properties.

Class Breaks Renderer

A class breaks renderer symbolizes based on the value of some numeric attribute.

Optional Argument

Description

background_fill_symbol

A symbol used for polygon features as a background if the renderer uses point symbols, e.g. for bivariate types & size rendering. Only applicable to polygon layers. PictureFillSymbols can also be used outside of the Map Viewer for Size and Predominance and Size renderers.

default_label

Default label for the default symbol used to draw unspecified values.

default_symbol

Symbol used when a value cannot be matched.

method

Determines the classification method that was used to generate class breaks.

Must be one of the following values:

  • esriClassifyDefinedInterval

  • esriClassifyEqualInterval

  • esriClassifyGeometricalInterval

  • esriClassifyNaturalBreaks

  • esriClassifyQuantile

  • esriClassifyStandardDeviation

  • esriClassifyManual

field

Attribute field used for renderer.

min_value

The minimum numeric data value needed to begin class breaks.

normalization_field

Used when normalizationType is field. The string value indicating the attribute field by which the data value is normalized.

normalization_total

Used when normalizationType is percent-of-total, this number property contains the total of all data values.

normalization_type

Determine how the data was normalized.

Must be one of the following values:

  • esriNormalizeByField

  • esriNormalizeByLog

  • esriNormalizeByPercentOfTotal

rotation_expression

A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets.

rotation_type

A string property which controls the origin and direction of rotation. If the rotation_type is defined as arithmetic, the symbol is rotated from East in a couter-clockwise direction where East is the 0 degree axis. If the rotationType is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis.

Must be one of the following values:

  • arithmetic

  • geographic

arcade_expression

An Arcade expression evaluating to a number.

arcade_title

The title identifying and describing the associated Arcade expression as defined in the arcade_expression property.

visual_variables

An object used to set rendering options.

** Symbol Syntax **

Optional Argument

Description

symbol_type

optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.

symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Point Symbols

  • ‘o’ - Circle (default)

  • ‘+’ - Cross

  • ‘D’ - Diamond

  • ‘s’ - Square

  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)

  • ‘’ - Backward Diagonal

  • ‘/’ - Forward Diagonal

  • ‘|’ - Vertical Bar

  • ‘-‘ - Horizontal Bar

  • ‘x’ - Diagonal Cross

  • ‘+’ - Cross

cmap

optional string or list. This is the color scheme a user can provide if the exact color is not needed, or a user can provide a list with the color defined as: [red, green blue, alpha]. The values red, green, blue are from 0-255 and alpha is a float value from 0 - 1. The default value is ‘jet’ color scheme.

cstep

optional integer. If provided, its the color location on the color scheme.

Simple Symbols

This is a list of optional parameters that can be given for point, line or polygon geometries.

Argument

Description

marker_size

optional float. Numeric size of the symbol given in points.

marker_angle

optional float. Numeric value used to rotate the symbol. The symbol is rotated counter-clockwise. For example, The following, angle=-30, in will create a symbol rotated -30 degrees counter-clockwise; that is, 30 degrees clockwise.

marker_xoffset

Numeric value indicating the offset on the x-axis in points.

marker_yoffset

Numeric value indicating the offset on the y-axis in points.

line_width

optional float. Numeric value indicating the width of the line in points

outline_style

Optional string. For polygon point, and line geometries , a customized outline type can be provided.

Allowed Styles:

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

outline_color

optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.

Picture Symbol

This type of symbol only applies to Points, MultiPoints and Polygons.

Argument

Description

marker_angle

Numeric value that defines the number of degrees ranging from 0-360, that a marker symbol is rotated. The rotation is from East in a counter-clockwise direction where East is the 0 axis.

marker_xoffset

Numeric value indicating the offset on the x-axis in points.

marker_yoffset

Numeric value indicating the offset on the y-axis in points.

height

Numeric value used if needing to resize the symbol. Specify a value in points. If images are to be displayed in their original size, leave this blank.

width

Numeric value used if needing to resize the symbol. Specify a value in points. If images are to be displayed in their original size, leave this blank.

url

String value indicating the URL of the image. The URL should be relative if working with static layers. A full URL should be used for map service dynamic layers. A relative URL can be dereferenced by accessing the map layer image resource or the feature layer image resource.

image_data

String value indicating the base64 encoded data.

xscale

Numeric value indicating the scale factor in x direction.

yscale

Numeric value indicating the scale factor in y direction.

outline_color

optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.

outline_style

Optional string. For polygon point, and line geometries , a customized outline type can be provided.

Allowed Styles:

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

outline_color

optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.

line_width

optional float. Numeric value indicating the width of the line in points

Text Symbol

This type of symbol only applies to Points, MultiPoints and Polygons.

Argument

Description

font_decoration

The text decoration. Must be one of the following values: - line-through - underline - none

font_family

Optional string. The font family.

font_size

Optional float. The font size in points.

font_style

Optional string. The text style. - italic - normal - oblique

font_weight

Optional string. The text weight. Must be one of the following values: - bold - bolder - lighter - normal

background_color

optional string/list. Background color is represented as a four-element array or string of a color map.

halo_color

Optional string/list. Color of the halo around the text. The default is None.

halo_size

Optional integer/float. The point size of a halo around the text symbol.

horizontal_alignment

optional string. One of the following string values representing the horizontal alignment of the text. Must be one of the following values: - left - right - center - justify

kerning

optional boolean. Boolean value indicating whether to adjust the spacing between characters in the text string.

line_color

optional string/list. Outline color is represented as a four-element array or string of a color map.

line_width

optional integer/float. Outline size.

marker_angle

optional int. A numeric value that defines the number of degrees (0 to 360) that a text symbol is rotated. The rotation is from East in a counter-clockwise direction where East is the 0 axis.

marker_xoffset

optional int/float.Numeric value indicating the offset on the x-axis in points.

marker_yoffset

optional int/float.Numeric value indicating the offset on the x-axis in points.

right_to_left

optional boolean. Set to true if using Hebrew or Arabic fonts.

rotated

optional boolean. Boolean value indicating whether every character in the text string is rotated.

text

Required string. Text Value to display next to geometry.

vertical_alignment

Optional string. One of the following string values representing the vertical alignment of the text. Must be one of the following values: - top - bottom - middle - baseline

Cartographic Symbol

This type of symbol only applies to line geometries.

Argument

Description

line_width

optional float. Numeric value indicating the width of the line in points

cap

Optional string. The cap style.

join

Optional string. The join style.

miter_limit

Optional string. Size threshold for showing mitered line joins.

The kwargs parameter accepts all parameters of the create_symbol method and the create_renderer method.

project(spatial_reference, transformation_name=None)

Reprojects the who dataset into a new spatial reference. This is an inplace operation meaning that it will update the defined geometry column from the set_geometry.

This requires ArcPy or pyproj v4

Argument

Description

spatial_reference

Required SpatialReference. The new spatial reference. This can be a SpatialReference object or the coordinate system name.

transformation_name

Optional String. The geotransformation name.

Returns

boolean

relationship(other, op, relation=None)

This method allows for dataframe to dataframe compairson using spatial relationships. The return is a pd.DataFrame that meet the operations’ requirements.

Argument

Description

other

Required Spatially Enabled DataFrame. The geometry to perform the operation from.

op

Optional String. The spatial operation to perform. The allowed value are: contains,crosses,disjoint,equals, overlaps,touches, or within.

  • contains - Indicates if the base geometry contains the comparison geometry.

  • crosses - Indicates if the two geometries intersect in a geometry of a lesser shape type.

  • disjoint - Indicates if the base and comparison geometries share no points in common.

  • equals - Indicates if the base and comparison geometries are of the same shape type and define the same set of points in the plane. This is a 2D comparison only; M and Z values are ignored.

  • overlaps - Indicates if the intersection of the two geometries has the same shape type as one of the input geometries and is not equivalent to either of the input geometries.

  • touches - Indicates if the boundaries of the geometries intersect.

  • within - Indicates if the base geometry is within the comparison geometry.

  • intersect - Intdicates if the base geometry has an intersection of the other geometry.

relation

Optional String. The spatial relationship type. The allowed values are: BOUNDARY, CLEMENTINI, and PROPER.

  • BOUNDARY - Relationship has no restrictions for interiors or boundaries.

  • CLEMENTINI - Interiors of geometries must intersect. This is the default.

  • PROPER - Boundaries of geometries must not intersect.

This only applies to contains,

Returns

Spatially enabled DataFrame (pd.DataFrame)

property renderer

Returns the renderer for the SeDF

Returns

InsensitiveDict

sanitize_column_names(convert_to_string=True, remove_special_char=True, inplace=False, use_snake_case=True)

Cleans column names by converting them to string, removing special characters, renaming columns without column names to ‘noname’, renaming duplicates with integer suffixes and switching spaces or Pascal or camel cases to Python’s favored snake_case style.

Snake_casing gives you consistent column names, no matter what the flavor of your backend database is when you publish the DataFrame as a Feature Layer in your web GIS.

Argument

Description

convert_to_string

Optional Boolean. Default is True. Converts column names to string

remove_special_char

Optional Boolean. Default is True. Removes any characters in column names that are not numeric or underscores. This also ensures column names begin with alphabets by removing numeral prefixes.

inplace

Optional Boolean. Default is False. If True, edits the DataFrame in place and returns Nothing. If False, returns a new DataFrame object.

use_snake_case

Optional Boolean. Default is True. Makes column names lower case, and replaces spaces between words with underscores. If column names are in PascalCase or camelCase, it replaces them to snake_case.

Returns

pd.DataFrame object if inplace=False. Else None.

select(other)

This operation performs a dataset wide selection by geometric intersection. A geometry or another Spatially enabled DataFrame can be given and select will return all rows that intersect that input geometry. The select operation uses a spatial index to complete the task, so if it is not built before the first run, the function will build a quadtree index on the fly.

requires ArcPy or Shapely

Returns

pd.DataFrame (spatially enabled)

set_geometry(col, sr=None)

Assigns the Geometry Column by Name or by List

sindex(stype='quadtree', reset=False, **kwargs)

Creates a spatial index for the given dataset.

By default the spatial index is a QuadTree spatial index.

If r-tree indexes should be used for large datasets. This will allow users to create very large out of memory indexes. To use r-tree indexes, the r-tree library must be installed. To do so, install via conda using the following command: conda install -c conda-forge rtree

property sr

gets/sets the spatial reference of the dataframe

to_feature_collection(name=None, drawing_info=None, extent=None, global_id_field=None)

Converts a spatially enabled pd.DataFrame to a Feature Collection

optional argument

Description

name

optional string. Name of the Feature Collection

drawing_info

Optional dictionary. This is the rendering information for a Feature Collection. Rendering information is a dictionary with the symbology, labelling and other properties defined. See: http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Renderer_objects/02r30000019t000000/

extent

Optional dictionary. If desired, a custom extent can be provided to set where the map starts up when showing the data. The default is the full extent of the dataset in the Spatial DataFrame.

global_id_field

Optional string. The Global ID field of the dataset.

Returns

FeatureCollection object

to_featureclass(location, overwrite=True, has_z=None, has_m=None, sanitize_columns=True)

Exports a geo enabled dataframe to a feature class.

Argument

Description

location

Required string. The output of the table.

overwrite

Optional Boolean. If True and if the feature class exists, it will be deleted and overwritten. This is default. If False, the feature class and the feature class exists, and exception will be raised.

has_z

Optional Boolean. If True, the dataset will be forced to have Z based geometries. If a geometry is missing a Z value when true, a RuntimeError will be raised. When False, the API will not use the Z value.

has_m

Optional Boolean. If True, the dataset will be forced to have M based geometries. If a geometry is missing a M value when true, a RuntimeError will be raised. When False, the API will not use the M value.

sanitize_columns

Optional Boolean. If True, column names will be converted to string, invalid characters removed and other checks will be performed. The default is True.

Returns

String

to_featurelayer(title, gis=None, tags=None, folder=None)

publishes a spatial dataframe to a new feature layer

Argument

Description

title

Required string. The name of the service

gis

Optional GIS. The GIS connection object

tags

Optional list of strings. A comma seperated list of descriptive words for the service.

folder

Optional string. Name of the folder where the featurelayer item and imported data would be stored.

Returns

FeatureLayer

to_featureset()

Converts a spatial dataframe to a feature set object

to_table(location, overwrite=True)

Exports a geo enabled dataframe to a table.

Argument

Description

location

Required string. The output of the table.

overwrite

Optional Boolean. If True and if the table exists, it will be deleted and overwritten. This is default. If False, the table and the table exists, and exception will be raised.

Returns

String

property true_centroid

Returns the true centroid of the dataframe

Returns

Geometry

>>> df.spatial.true_centroid
(1.23427, 34)
validate(strict=False)

Determines if the Geo Accessor is Valid with Geometries in all values

voronoi()

Generates a voronoi diagram on the whole dataset. If the geometry is not a Point then the centroid is used for the geometry. The result is a polygon GeoArray Series that matches 1:1 to the original dataset.

requires scipy

Returns

pd.Series

arcgis.features.GeoSeriesAccessor

class arcgis.features.GeoSeriesAccessor(obj)
property JSON

Returns JSON string of Geometry

Returns

Series of strings

property WKB

Returns the Geometry as WKB

Returns

Series of Bytes

property WKT

Returns the Geometry as WKT

Returns

Series of String

angle_distance_to(second_geometry, method='GEODESIC')

Returns a tuple of angle and distance to another point using a measurement type.

Argument

Description

second_geometry

Required Geometry. A arcgis.Geometry object.

method

Optional String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

Returns

a tuple of angle and distance to another point using a measurement type.

property area

Returns the features area

Returns

float in a series

property as_arcpy

Returns the features as ArcPy Geometry

Returns

arcpy.Geometry in a series

property as_shapely

Returns the features as Shapely Geometry

Returns

shapely.Geometry in a series

boundary()

Constructs the boundary of the geometry.

Returns

arcgis.geometry.Polyline

buffer(distance)

Constructs a polygon at a specified distance from the geometry.

Argument

Description

distance

Required float. The buffer distance. The buffer distance is in the same units as the geometry that is being buffered. A negative distance can only be specified against a polygon geometry.

Returns

arcgis.geometry.Polygon

property centroid

Returns the feature’s centroid

Returns

tuple (x,y) in series

clip(envelope)

Constructs the intersection of the geometry and the specified extent.

Argument

Description

envelope

required tuple. The tuple must have (XMin, YMin, XMax, YMax) each value represents the lower left bound and upper right bound of the extent.

Returns

output geometry clipped to extent

contains(second_geometry, relation=None)

Indicates if the base geometry contains the comparison geometry.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

relation

Optional string. The spatial relationship type.

  • BOUNDARY - Relationship has no restrictions for interiors or boundaries.

  • CLEMENTINI - Interiors of geometries must intersect. Specifying CLEMENTINI is equivalent to specifying None. This is the default.

  • PROPER - Boundaries of geometries must not intersect.

Returns

boolean

convex_hull()

Constructs the geometry that is the minimal bounding polygon such that all outer angles are convex.

crosses(second_geometry)

Indicates if the two geometries intersect in a geometry of a lesser shape type.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

boolean

cut(cutter)

Splits this geometry into a part left of the cutting polyline, and a part right of it.

Argument

Description

cutter

Required Polyline. The cuttin polyline geometry

Returns

a list of two geometries

densify(method, distance, deviation)

Creates a new geometry with added vertices

Argument

Description

method

Required String. The type of densification, DISTANCE, ANGLE, or GEODESIC

distance

Required float. The maximum distance between vertices. The actual distance between vertices will usually be less than the maximum distance as new vertices will be evenly distributed along the original segment. If using a type of DISTANCE or ANGLE, the distance is measured in the units of the geometry’s spatial reference. If using a type of GEODESIC, the distance is measured in meters.

deviation

Required float. Densify uses straight lines to approximate curves. You use deviation to control the accuracy of this approximation. The deviation is the maximum distance between the new segment and the original curve. The smaller its value, the more segments will be required to approximate the curve.

Returns

arcgis.geometry.Geometry

difference(second_geometry)

Constructs the geometry that is composed only of the region unique to the base geometry but not part of the other geometry. The following illustration shows the results when the red polygon is the source geometry.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

arcgis.geometry.Geometry

disjoint(second_geometry)

Indicates if the base and comparison geometries share no points in common.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

boolean

distance_to(second_geometry)

Returns the minimum distance between two geometries. If the geometries intersect, the minimum distance is 0. Both geometries must have the same projection.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

float

equals(second_geometry)

Indicates if the base and comparison geometries are of the same shape type and define the same set of points in the plane. This is a 2D comparison only; M and Z values are ignored.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

boolean

property extent

Returns the feature’s extent

Returns

tuple (xmin,ymin,xmax,ymax) in series

property first_point

Returns the feature’s first point

Returns

Geometry

generalize(max_offset)

Creates a new simplified geometry using a specified maximum offset tolerance. This only works on Polylines and Polygons.

Argument

Description

max_offset

Required float. The maximum offset tolerance.

Returns

arcgis.geometry.Geometry

property geoextent

A returns the geometry’s extents

Returns

Series of Floats

property geometry_type

returns the geometry types

Returns

Series of strings

get_area(method, units=None)

Returns the area of the feature using a measurement type.

Argument

Description

method

Required String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

units

Optional String. Areal unit of measure keywords: ACRES | ARES | HECTARES | SQUARECENTIMETERS | SQUAREDECIMETERS | SQUAREINCHES | SQUAREFEET | SQUAREKILOMETERS | SQUAREMETERS | SQUAREMILES | SQUAREMILLIMETERS | SQUAREYARDS

Returns

float

get_length(method, units)

Returns the length of the feature using a measurement type.

Argument

Description

method

Required String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

units

Required String. Linear unit of measure keywords: CENTIMETERS | DECIMETERS | FEET | INCHES | KILOMETERS | METERS | MILES | MILLIMETERS | NAUTICALMILES | YARDS

Returns

float

get_part(index=None)

Returns an array of point objects for a particular part of geometry or an array containing a number of arrays, one for each part.

requires arcpy

Argument

Description

index

Required Integer. The index position of the geometry.

Returns

arcpy.Array

property has_m

Determines if the geometry has a M value

Returns

Series of Boolean

property has_z

Determines if the geometry has a Z value

Returns

Series of Boolean

property hull_rectangle

A space-delimited string of the coordinate pairs of the convex hull

Returns

Series of strings

intersect(second_geometry, dimension=1)

Constructs a geometry that is the geometric intersection of the two input geometries. Different dimension values can be used to create different shape types. The intersection of two geometries of the same shape type is a geometry containing only the regions of overlap between the original geometries.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

dimension

Required Integer. The topological dimension (shape type) of the resulting geometry.

  • 1 -A zero-dimensional geometry (point or multipoint).

  • 2 -A one-dimensional geometry (polyline).

  • 4 -A two-dimensional geometry (polygon).

Returns

boolean

property is_empty

Returns True/False if feature is empty

Returns

Series of Booleans

property is_multipart

Returns True/False if features has multiple parts

Returns

Series of Booleans

property is_valid

Returns True/False if features geometry is valid

Returns

Series of Booleans

property label_point

Returns the geometry point for the optimal label location

Returns

Series of Geometries

property last_point

Returns the Geometry of the last point in a feature.

Returns

Series of Geometry

property length

Returns the length of the features

Returns

Series of float

property length3D

Returns the length of the features

Returns

Series of float

measure_on_line(second_geometry, as_percentage=False)

Returns a measure from the start point of this line to the in_point.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

as_percentage

Optional Boolean. If False, the measure will be returned as a distance; if True, the measure will be returned as a percentage.

Returns

float

overlaps(second_geometry)

Indicates if the intersection of the two geometries has the same shape type as one of the input geometries and is not equivalent to either of the input geometries.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

boolean

property part_count

Returns the number of parts in a feature’s geometry

Returns

Series of Integer

property point_count

Returns the number of points in a feature’s geometry

Returns

Series of Integer

point_from_angle_and_distance(angle, distance, method='GEODESCIC')

Returns a point at a given angle and distance in degrees and meters using the specified measurement type.

Argument

Description

angle

Required Float. The angle in degrees to the returned point.

distance

Required Float. The distance in meters to the returned point.

method

Optional String. PLANAR measurements reflect the projection of geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

Returns

arcgis.geometry.Geometry

position_along_line(value, use_percentage=False)

Returns a point on a line at a specified distance from the beginning of the line.

Argument

Description

value

Required Float. The distance along the line.

use_percentage

Optional Boolean. The distance may be specified as a fixed unit of measure or a ratio of the length of the line. If True, value is used as a percentage; if False, value is used as a distance. For percentages, the value should be expressed as a double from 0.0 (0%) to 1.0 (100%).

Returns

Geometry

project_as(spatial_reference, transformation_name=None)

Projects a geometry and optionally applies a geotransformation.

Argument

Description

spatial_reference

Required SpatialReference. The new spatial reference. This can be a SpatialReference object or the coordinate system name.

transformation_name

Required String. The geotransformation name.

Returns

arcgis.geometry.Geometry

query_point_and_distance(second_geometry, use_percentage=False)

Finds the point on the polyline nearest to the in_point and the distance between those points. Also returns information about the side of the line the in_point is on as well as the distance along the line where the nearest point occurs.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

as_percentage

Optional boolean - if False, the measure will be returned as distance, True, measure will be a percentage

Returns

tuple

segment_along_line(start_measure, end_measure, use_percentage=False)

Returns a Polyline between start and end measures. Similar to Polyline.positionAlongLine but will return a polyline segment between two points on the polyline instead of a single point.

Argument

Description

start_measure

Required Float. The starting distance from the beginning of the line.

end_measure

Required Float. The ending distance from the beginning of the line.

use_percentage

Optional Boolean. The start and end measures may be specified as fixed units or as a ratio. If True, start_measure and end_measure are used as a percentage; if False, start_measure and end_measure are used as a distance. For percentages, the measures should be expressed as a double from 0.0 (0 percent) to 1.0 (100 percent).

Returns

Geometry

snap_to_line(second_geometry)

Returns a new point based on in_point snapped to this geometry.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

arcgis.gis.Geometry

property spatial_reference

Returns the Spatial Reference of the Geometry

Returns

Series of SpatialReference

symmetric_difference(second_geometry)

Constructs the geometry that is the union of two geometries minus the instersection of those geometries.

The two input geometries must be the same shape type.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

arcgis.gis.Geometry

touches(second_geometry)

Indicates if the boundaries of the geometries intersect.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

boolean

property true_centroid

Returns the true centroid of the Geometry

Returns

Series of Points

union(second_geometry)

Constructs the geometry that is the set-theoretic union of the input geometries.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

Returns

arcgis.gis.Geometry

within(second_geometry, relation=None)

Indicates if the base geometry is within the comparison geometry.

Argument

Description

second_geometry

Required arcgis.geometry.Geometry. A second geometry

relation

Optional String. The spatial relationship type.

  • BOUNDARY - Relationship has no restrictions for interiors or boundaries.

  • CLEMENTINI - Interiors of geometries must intersect. Specifying CLEMENTINI is equivalent to specifying None. This is the default.

  • PROPER - Boundaries of geometries must not intersect.

Returns

boolean

arcgis.features.SpatialDataFrame

class arcgis.features.SpatialDataFrame(*args, **kwargs)

This class is deprecated infavor of the GeoAccessor/GeoSeriesAccessor Pattern

A Spatial Dataframe is an object to manipulate, manage and translate data into new forms of information for users.

Functionality of the Spatial DataFrame is determined by the Geometry Engine available to the object at creation. It will first leverage the arcpy geometry engine, then shapely, then it will create the geometry objects without any engine.

Scenerios

Engine Type

Functionality

ArcPy

Users will have the full functionality provided by the API.

Shapely

Users get a sub-set of operations, and all properties.

Valid Properties

  • JSON

  • WKT

  • WKB

  • area

  • centroid

  • extent

  • first_point

  • hull_rectangle

  • is_multipart

  • label_point

  • last_point

  • length

  • length3D

  • part_count

  • point_count

  • true_centroid

Valid Functions

  • boundary

  • buffer

  • contains

  • convex_hull

  • crosses

  • difference

  • disjoint

  • distance_to

  • equals

  • generalize

  • intersect

  • overlaps

  • symmetric_difference

  • touches

  • union

  • within

Everything else will return None

No Engine

Values will return None by default

Required Parameters:

None

Optional:
param data

panda’s dataframe containing attribute information

param geometry

list/array/geoseries of arcgis.geometry objects

param sr

spatial reference of the dataframe. This can be the factory code, WKT string, arcpy.SpatialReference object, or arcgis.SpatailReference object.

param gis

passing a gis.GIS object set to Pro will ensure arcpy is installed and a full swatch of functionality is available to the end user.

property JSON

Returns an Esri JSON representation of the geometry as a string.

property T
property WKB

Returns the well-known binary (WKB) representation for OGC geometry. It provides a portable representation of a geometry value as a contiguous stream of bytes.

property WKT

Returns the well-known text (WKT) representation for OGC geometry. It provides a portable representation of a geometry value as a text string.

abs()FrameOrSeries

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

abs

Series/DataFrame containing the absolute value of each element.

numpy.absolute : Calculate the absolute value element-wise.

For complex inputs, 1.2 + 1j, the absolute value is \(\sqrt{ a^2 + b^2 }\).

Absolute numeric values in a Series.

>>> s = pd.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64

Absolute numeric values in a Series with complex numbers.

>>> s = pd.Series([1.2 + 1j])
>>> s.abs()
0    1.56205
dtype: float64

Absolute numeric values in a Series with a Timedelta element.

>>> s = pd.Series([pd.Timedelta('1 days')])
>>> s.abs()
0   1 days
dtype: timedelta64[ns]

Select rows with data closest to certain value using argsort (from StackOverflow).

>>> df = pd.DataFrame({
...     'a': [4, 5, 6, 7],
...     'b': [10, 20, 30, 40],
...     'c': [100, 50, -30, -50]
... })
>>> df
     a    b    c
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
>>> df.loc[(df.c - 43).abs().argsort()]
     a    b    c
1    5   20   50
0    4   10  100
2    6   30  -30
3    7   40  -50
add(other, axis='columns', level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
add_prefix(prefix: str)FrameOrSeries

Prefix labels with string prefix.

For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.

prefixstr

The string to add before each label.

Series or DataFrame

New Series or DataFrame with updated labels.

Series.add_suffix: Suffix row labels with string suffix. DataFrame.add_suffix: Suffix column labels with string suffix.

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.add_prefix('item_')
item_0    1
item_1    2
item_2    3
item_3    4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
   A  B
0  1  3
1  2  4
2  3  5
3  4  6
>>> df.add_prefix('col_')
     col_A  col_B
0       1       3
1       2       4
2       3       5
3       4       6
add_suffix(suffix: str)FrameOrSeries

Suffix labels with string suffix.

For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.

suffixstr

The string to add after each label.

Series or DataFrame

New Series or DataFrame with updated labels.

Series.add_prefix: Prefix row labels with string prefix. DataFrame.add_prefix: Prefix column labels with string prefix.

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.add_suffix('_item')
0_item    1
1_item    2
2_item    3
3_item    4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
   A  B
0  1  3
1  2  4
2  3  5
3  4  6
>>> df.add_suffix('_col')
     A_col  B_col
0       1       3
1       2       4
2       3       5
3       4       6
agg(func=None, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

funcfunction, str, list or dict

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

  • function

  • string function name

  • list of functions and/or function names, e.g. [np.sum, 'mean']

  • dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args

Positional arguments to pass to func.

**kwargs

Keyword arguments to pass to func.

scalar, Series or DataFrame

The return can be:

  • scalar : when Series.agg is called with single function

  • Series : when DataFrame.agg is called with a single function

  • DataFrame : when DataFrame.agg is called with several functions

Return scalar, Series or DataFrame.

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

DataFrame.apply : Perform any type of operations. DataFrame.transform : Perform transformation type operations. core.groupby.GroupBy : Perform operations over groups. core.resample.Resampler : Perform operations over resampled bins. core.window.Rolling : Perform operations over rolling window. core.window.Expanding : Perform operations over expanding window. core.window.ExponentialMovingWindow : Perform operation over exponential weighted

window.

agg is an alias for aggregate. Use the alias.

A passed user-defined-function will be passed a Series for evaluation.

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
aggregate(func=None, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

funcfunction, str, list or dict

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

  • function

  • string function name

  • list of functions and/or function names, e.g. [np.sum, 'mean']

  • dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args

Positional arguments to pass to func.

**kwargs

Keyword arguments to pass to func.

scalar, Series or DataFrame

The return can be:

  • scalar : when Series.agg is called with single function

  • Series : when DataFrame.agg is called with a single function

  • DataFrame : when DataFrame.agg is called with several functions

Return scalar, Series or DataFrame.

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

DataFrame.apply : Perform any type of operations. DataFrame.transform : Perform transformation type operations. core.groupby.GroupBy : Perform operations over groups. core.resample.Resampler : Perform operations over resampled bins. core.window.Rolling : Perform operations over rolling window. core.window.Expanding : Perform operations over expanding window. core.window.ExponentialMovingWindow : Perform operation over exponential weighted

window.

agg is an alias for aggregate. Use the alias.

A passed user-defined-function will be passed a Series for evaluation.

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)pandas.core.frame.DataFrame

Align two objects on their axes with the specified join method.

Join method is specified for each axis Index.

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None).

levelint or level name, default None

Broadcast across a level, matching Index values on the passed MultiIndex level.

copybool, default True

Always returns new objects. If copy=False and no reindexing is required then original objects are returned.

fill_valuescalar, default np.NaN

Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series:

  • pad / ffill: propagate last valid observation forward to next valid.

  • backfill / bfill: use NEXT valid observation to fill gap.

limitint, default None

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

fill_axis{0 or ‘index’, 1 or ‘columns’}, default 0

Filling axis, method and limit.

broadcast_axis{0 or ‘index’, 1 or ‘columns’}, default None

Broadcast values along this axis, if aligning two objects of different dimensions.

(left, right)(DataFrame, type of other)

Aligned objects.

all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

  • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

  • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

  • None : reduce all axes, return a scalar.

bool_onlybool, default None

Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

**kwargsany, default None

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Series or DataFrame

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Series.all : Return True if all elements are True. DataFrame.any : Return True if one (or more) elements are True.

Series

>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([]).all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True

DataFrames

Create a dataframe from a dictionary.

>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
   col1   col2
0  True   True
1  True  False

Default behaviour checks if column-wise values all return True.

>>> df.all()
col1     True
col2    False
dtype: bool

Specify axis='columns' to check if row-wise values all return True.

>>> df.all(axis='columns')
0     True
1    False
dtype: bool

Or axis=None for whether every value is True.

>>> df.all(axis=None)
False
angle_distance_to(second_geometry, method='GEODESIC')

Returns a tuple of angle and distance to another point using a measurement type.

Paramters:
second_geometry
  • a second geometry

method
  • PLANAR measurements reflect the projection of geographic

data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

  • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

  • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

  • None : reduce all axes, return a scalar.

bool_onlybool, default None

Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

**kwargsany, default None

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Series or DataFrame

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

numpy.any : Numpy version of this method. Series.any : Return whether any element is True. Series.all : Return whether all elements are True. DataFrame.any : Return whether any element is True over requested axis. DataFrame.all : Return whether all elements are True over requested axis.

Series

For Series input, the output is a scalar indicating whether any element is True.

>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([]).any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True

DataFrame

Whether each column contains at least one True element (the default).

>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
   A  B  C
0  1  0  0
1  2  2  0
>>> df.any()
A     True
B     True
C    False
dtype: bool

Aggregating over the columns.

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2
>>> df.any(axis='columns')
0    True
1    True
dtype: bool
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
       A  B
0   True  1
1  False  0
>>> df.any(axis='columns')
0    True
1    False
dtype: bool

Aggregating over the entire DataFrame with axis=None.

>>> df.any(axis=None)
True

any for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)
append(other, ignore_index=False, verify_integrity=False, sort=False)pandas.core.frame.DataFrame

Append rows of other to the end of caller, returning a new object.

Columns in other that are not in the caller are added as new columns.

otherDataFrame or Series/dict-like object, or list of these

The data to append.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

verify_integritybool, default False

If True, raise ValueError on creating index with duplicates.

sortbool, default False

Sort columns if the columns of self and other are not aligned.

Changed in version 1.0.0: Changed to not sort by default.

DataFrame

concat : General function to concatenate DataFrame or Series objects.

If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.

Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
>>> df
   A  B
0  1  2
1  3  4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
>>> df.append(df2)
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

With ignore_index set to True:

>>> df.append(df2, ignore_index=True)
   A  B
0  1  2
1  3  4
2  5  6
3  7  8

The following, while not recommended methods for generating DataFrames, show two ways to generate a DataFrame from multiple data sources.

Less efficient:

>>> df = pd.DataFrame(columns=['A'])
>>> for i in range(5):
...     df = df.append({'A': i}, ignore_index=True)
>>> df
   A
0  0
1  1
2  2
3  3
4  4

More efficient:

>>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],
...           ignore_index=True)
   A
0  0
1  1
2  2
3  3
4  4
apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

funcfunction

Function to apply to each column or row.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the function is applied:

  • 0 or ‘index’: apply function to each column.

  • 1 or ‘columns’: apply function to each row.

rawbool, default False

Determines if row or column is passed as a Series or ndarray object:

  • False : passes each row or column as a Series to the function.

  • True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

result_type{‘expand’, ‘reduce’, ‘broadcast’, None}, default None

These only act when axis=1 (columns):

  • ‘expand’ : list-like results will be turned into columns.

  • ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.

  • ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

argstuple

Positional arguments to pass to func in addition to the array/series.

**kwds

Additional keyword arguments to pass as keywords arguments to func.

Series or DataFrame

Result of applying func along the given axis of the DataFrame.

DataFrame.applymap: For elementwise operations. DataFrame.aggregate: Only perform aggregating type operations. DataFrame.transform: Only perform transforming type operations.

>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9

Using a numpy universal function (in this case the same as np.sqrt(df)):

>>> df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

Using a reducing function on either axis

>>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64
>>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

Returning a list-like will result in a Series

>>> df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

Passing result_type='expand' will expand list-like results to columns of a Dataframe

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
   0  1
0  1  2
1  1  2
2  1  2

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2

Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
   A  B
0  1  2
1  1  2
2  1  2
applymap(func, na_action: Optional[str] = None)pandas.core.frame.DataFrame

Apply a function to a Dataframe elementwise.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

funccallable

Python function, returns a single value from a single value.

na_action{None, ‘ignore’}, default None

If ‘ignore’, propagate NaN values, without passing them to func.

New in version 1.2.

DataFrame

Transformed DataFrame.

DataFrame.apply : Apply a function along input axis of DataFrame.

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567
>>> df.applymap(lambda x: len(str(x)))
   0  1
0  3  4
1  5  5

Like Series.map, NA values can be ignored:

>>> df_copy = df.copy()
>>> df_copy.iloc[0, 0] = pd.NA
>>> df_copy.applymap(lambda x: len(str(x)), na_action='ignore')
      0  1
0  <NA>  4
1     5  5

Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

>>> df.applymap(lambda x: x**2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

But it’s better to avoid applymap in that case.

>>> df ** 2
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
property area

The area of a polygon feature. Empty for all other feature types.

property as_arcpy

Returns an Esri ArcPy geometry in a Series

property as_shapely

Returns a Shapely Geometry Objects in a Series

asfreq(freq, method=None, how: Optional[str] = None, normalize: bool = False, fill_value=None)FrameOrSeries

Convert TimeSeries to specified frequency.

Optionally provide filling method to pad/backfill missing values.

Returns the original data conformed to a new index with the specified frequency. resample is more appropriate if an operation, such as summarization, is necessary to represent the data at the new frequency.

freqDateOffset or str

Frequency DateOffset or string.

method{‘backfill’/’bfill’, ‘pad’/’ffill’}, default None

Method to use for filling holes in reindexed Series (note this does not fill NaNs that already were present):

  • ‘pad’ / ‘ffill’: propagate last valid observation forward to next valid

  • ‘backfill’ / ‘bfill’: use NEXT valid observation to fill.

how{‘start’, ‘end’}, default end

For PeriodIndex only (see PeriodIndex.asfreq).

normalizebool, default False

Whether to reset output index to midnight.

fill_valuescalar, optional

Value to use for missing values, applied during upsampling (note this does not fill NaNs that already were present).

Same type as caller

Object converted to the specified frequency.

reindex : Conform DataFrame to new index with optional filling logic.

To learn more about the frequency strings, please see this link.

Start by creating a series with 4 one minute timestamps.

>>> index = pd.date_range('1/1/2000', periods=4, freq='T')
>>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
>>> df = pd.DataFrame({'s':series})
>>> df
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:01:00    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:03:00    3.0

Upsample the series into 30 second bins.

>>> df.asfreq(freq='30S')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    NaN
2000-01-01 00:03:00    3.0

Upsample again, providing a fill value.

>>> df.asfreq(freq='30S', fill_value=9.0)
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    9.0
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    9.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    9.0
2000-01-01 00:03:00    3.0

Upsample again, providing a method.

>>> df.asfreq(freq='30S', method='bfill')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    2.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    3.0
2000-01-01 00:03:00    3.0
asof(where, subset=None)

Return the last row(s) without any NaNs before where.

The last row (for each element in where, if list) without any NaN is taken. In case of a DataFrame, the last row without NaN considering only the subset of columns (if not None)

If there is no good value, NaN is returned for a Series or a Series of NaN values for a DataFrame

wheredate or array-like of dates

Date(s) before which the last row(s) are returned.

subsetstr or array-like of str, default None

For DataFrame, if not None, only use these columns to check for NaNs.

scalar, Series, or DataFrame

The return can be:

  • scalar : when self is a Series and where is a scalar

  • Series: when self is a Series and where is an array-like, or when self is a DataFrame and where is a scalar

  • DataFrame : when self is a DataFrame and where is an array-like

Return scalar, Series, or DataFrame.

merge_asof : Perform an asof merge. Similar to left join.

Dates are assumed to be sorted. Raises if this is not the case.

A Series and a scalar where.

>>> s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40])
>>> s
10    1.0
20    2.0
30    NaN
40    4.0
dtype: float64
>>> s.asof(20)
2.0

For a sequence where, a Series is returned. The first value is NaN, because the first element of where is before the first index value.

>>> s.asof([5, 20])
5     NaN
20    2.0
dtype: float64

Missing values are not considered. The following is 2.0, not NaN, even though NaN is at the index location for 30.

>>> s.asof(30)
2.0

Take all columns into consideration

>>> df = pd.DataFrame({'a': [10, 20, 30, 40, 50],
...                    'b': [None, None, None, None, 500]},
...                   index=pd.DatetimeIndex(['2018-02-27 09:01:00',
...                                           '2018-02-27 09:02:00',
...                                           '2018-02-27 09:03:00',
...                                           '2018-02-27 09:04:00',
...                                           '2018-02-27 09:05:00']))
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']))
                      a   b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN

Take a single column into consideration

>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']),
...         subset=['a'])
                         a   b
2018-02-27 09:03:30   30.0 NaN
2018-02-27 09:04:30   40.0 NaN
assign(**kwargs)pandas.core.frame.DataFrame

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

**kwargsdict of {str: callable or Series}

The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

DataFrame

A new DataFrame with the new columns in addition to all the existing columns.

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
...                   index=['Portland', 'Berkeley'])
>>> df
          temp_c
Portland    17.0
Berkeley    25.0

Where the value is a callable, evaluated on df:

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
...           temp_k=lambda x: (x['temp_f'] +  459.67) * 5 / 9)
          temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15
astype(dtype, copy: bool = True, errors: str = 'raise')FrameOrSeries

Cast a pandas object to a specified dtype dtype.

dtypedata type, or dict of column name -> data type

Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

copybool, default True

Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).

errors{‘raise’, ‘ignore’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised

  • ignore : suppress exceptions. On error return original object.

casted : same type as caller

to_datetime : Convert argument to datetime. to_timedelta : Convert argument to timedelta. to_numeric : Convert argument to a numeric type. numpy.ndarray.astype : Cast a numpy array to a specified type.

Create a DataFrame:

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

>>> df.astype('int32').dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

>>> df.astype({'col1': 'int32'}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype('int64')
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype('category')
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> cat_dtype = pd.api.types.CategoricalDtype(
...     categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Note that using copy=False and changing data on a new pandas object may propagate changes:

>>> s1 = pd.Series([1, 2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1  # note that s1[0] has changed too
0    10
1     2
dtype: int64

Create a series of dates:

>>> ser_date = pd.Series(pd.date_range('20200101', periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[ns]

Datetimes are localized to UTC first before converting to the specified timezone:

>>> ser_date.astype('datetime64[ns, US/Eastern]')
0   2019-12-31 19:00:00-05:00
1   2020-01-01 19:00:00-05:00
2   2020-01-02 19:00:00-05:00
dtype: datetime64[ns, US/Eastern]
property at

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

KeyError

If ‘label’ does not exist in DataFrame.

DataFrame.iatAccess a single value for a row/column pair by integer

position.

DataFrame.loc : Access a group of rows and columns by label(s). Series.at : Access a single value using a label.

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30

Get value at specified row/column pair

>>> df.at[4, 'B']
2

Set value at specified row/column pair

>>> df.at[4, 'B'] = 10
>>> df.at[4, 'B']
10

Get value within a Series

>>> df.loc[5].at['B']
4
at_time(time, asof: bool = False, axis=None)FrameOrSeries

Select values at particular time of day (e.g., 9:30AM).

time : datetime.time or str axis : {0 or ‘index’, 1 or ‘columns’}, default 0

New in version 0.24.0.

Series or DataFrame

TypeError

If the index is not a DatetimeIndex

between_time : Select values between particular times of the day. first : Select initial periods of time series based on a date offset. last : Select final periods of time series based on a date offset. DatetimeIndex.indexer_at_time : Get just the index locations for

values at particular time of the day.

>>> i = pd.date_range('2018-04-09', periods=4, freq='12H')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-09 12:00:00  2
2018-04-10 00:00:00  3
2018-04-10 12:00:00  4
>>> ts.at_time('12:00')
                     A
2018-04-09 12:00:00  2
2018-04-10 12:00:00  4
property attrs

Dictionary of global attributes of this dataset.

Warning

attrs is experimental and may change without warning.

DataFrame.flags : Global flags applying to this object.

property axes

Return a list representing the axes of the DataFrame.

It has the row axis labels and column axis labels as the only members. They are returned in that order.

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'],
dtype='object')]
backfill(axis=None, inplace: bool = False, limit=None, downcast=None)Optional[FrameOrSeries]

Synonym for DataFrame.fillna() with method='bfill'.

Series/DataFrame or None

Object with missing values filled or None if inplace=True.

between_time(start_time, end_time, include_start: bool = True, include_end: bool = True, axis=None)FrameOrSeries

Select values between particular times of the day (e.g., 9:00-9:30 AM).

By setting start_time to be later than end_time, you can get the times that are not between the two times.

start_timedatetime.time or str

Initial time as a time filter limit.

end_timedatetime.time or str

End time as a time filter limit.

include_startbool, default True

Whether the start time needs to be included in the result.

include_endbool, default True

Whether the end time needs to be included in the result.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Determine range time on index or columns value.

New in version 0.24.0.

Series or DataFrame

Data from the original object filtered to the specified dates range.

TypeError

If the index is not a DatetimeIndex

at_time : Select values at a particular time of the day. first : Select initial periods of time series based on a date offset. last : Select final periods of time series based on a date offset. DatetimeIndex.indexer_between_time : Get just the index locations for

values between particular times of the day.

>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4
>>> ts.between_time('0:15', '0:45')
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are not between two times by setting start_time later than end_time:

>>> ts.between_time('0:45', '0:15')
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4
bfill(axis=None, inplace: bool = False, limit=None, downcast=None)Optional[FrameOrSeries]

Synonym for DataFrame.fillna() with method='bfill'.

Series/DataFrame or None

Object with missing values filled or None if inplace=True.

bool()

Return the bool of a single element Series or DataFrame.

This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).

bool

The value in the Series or DataFrame.

Series.astype : Change the data type of a Series, including to boolean. DataFrame.astype : Change the data type of a DataFrame, including to boolean. numpy.bool_ : NumPy boolean data type, used by pandas for boolean values.

The method will only work for single element objects with a boolean value:

>>> pd.Series([True]).bool()
True
>>> pd.Series([False]).bool()
False
>>> pd.DataFrame({'col': [True]}).bool()
True
>>> pd.DataFrame({'col': [False]}).bool()
False
boundary()

Constructs the boundary of the geometry.

property bounds

Return a DataFrame of minx, miny, maxx, maxy values of geometry objects

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)

Make a box plot from DataFrame columns.

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

For further details see Wikipedia’s entry for boxplot.

columnstr or list of str, optional

Column name or list of names, or vector. Can be any valid input to pandas.DataFrame.groupby().

bystr or array-like, optional

Column in the DataFrame to pandas.DataFrame.groupby(). One box-plot will be done per value of columns in by.

axobject of class matplotlib.axes.Axes, optional

The matplotlib axes to be used by boxplot.

fontsizefloat or str

Tick label font size in points or as a string (e.g., large).

rotint or float, default 0

The rotation angle of labels (in degrees) with respect to the screen coordinate system.

gridbool, default True

Setting this to True will show the grid.

figsizeA tuple (width, height) in inches

The size of the figure to create in matplotlib.

layouttuple (rows, columns), optional

For example, (3, 5) will display the subplots using 3 columns and 5 rows, starting from the top-left.

return_type{‘axes’, ‘dict’, ‘both’} or None, default ‘axes’

The kind of object to return. The default is axes.

  • ‘axes’ returns the matplotlib axes the boxplot is drawn on.

  • ‘dict’ returns a dictionary whose values are the matplotlib Lines of the boxplot.

  • ‘both’ returns a namedtuple with the axes and dict.

  • when grouping with by, a Series mapping columns to return_type is returned.

    If return_type is None, a NumPy array of axes with the same shape as layout is returned.

backendstr, default None

Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.

New in version 1.0.0.

**kwargs

All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().

result

See Notes.

Series.plot.hist: Make a histogram. matplotlib.pyplot.boxplot : Matplotlib equivalent plot.

The return type depends on the return_type parameter:

  • ‘axes’ : object of class matplotlib.axes.Axes

  • ‘dict’ : dict of matplotlib.lines.Line2D objects

  • ‘both’ : a namedtuple with structure (ax, lines)

For data grouped with by, return a Series of the above or a numpy array:

  • Series

  • array (for return_type = None)

Use return_type='dict' when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.

Boxplots can be created for every column in the dataframe by df.boxplot() or indicating the columns to be used:

Boxplots of variables distributions grouped by the values of a third variable can be created using the option by. For instance:

A list of strings (i.e. ['X', 'Y']) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:

The layout of boxplot can be adjusted giving a tuple to layout:

Additional formatting can be done to the boxplot, like suppressing the grid (grid=False), rotating the labels in the x-axis (i.e. rot=45) or changing the fontsize (i.e. fontsize=15):

The parameter return_type can be used to select the type of element returned by boxplot. When return_type='axes' is selected, the matplotlib axes on which the boxplot is drawn are returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], return_type='axes')
>>> type(boxplot)
<class 'matplotlib.axes._subplots.AxesSubplot'>

When grouping with by, a Series mapping columns to return_type is returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type='axes')
>>> type(boxplot)
<class 'pandas.core.series.Series'>

If return_type is None, a NumPy array of axes with the same shape as layout is returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type=None)
>>> type(boxplot)
<class 'numpy.ndarray'>
buffer(distance)

Constructs a polygon at a specified distance from the geometry.

Parameters:
distance
  • length in current projection. Only polygon accept

negative values.

property centroid

The true centroid if it is within or on the feature; otherwise, the label point is returned. Returns a point object.

clip(envelope)

Constructs the intersection of the geometry and the specified extent.

Parameters:
envelope
  • arcpy.Extent object

columns: Index

The column labels of the DataFrame.

combine(other: pandas.core.frame.DataFrame, func, fill_value=None, overwrite=True)pandas.core.frame.DataFrame

Perform column-wise combine with another DataFrame.

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

otherDataFrame

The DataFrame to merge column-wise.

funcfunction

Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns.

fill_valuescalar value, default None

The value to fill NaNs with prior to passing any column to the merge func.

overwritebool, default True

If True, columns in self that do not exist in other will be overwritten with NaNs.

DataFrame

Combination of the provided DataFrames.

DataFrame.combine_firstCombine two DataFrame objects and default to

non-null values in frame calling the method.

Combine using a simple function that chooses the smaller column.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Example using a true element-wise combine function.

>>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3

Using fill_value fills Nones prior to passing the column to the merge function.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None is preserved

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
    A    B
0  0 -5.0
1  0  3.0

Example that demonstrates the use of overwrite and behavior when the axis differ between the dataframes.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
>>> df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0
>>> df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN
>>> df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0
combine_first(other: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame

Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.

otherDataFrame

Provided DataFrame to use to fill null values.

DataFrame

DataFrame.combinePerform series-wise operation on two DataFrames

using a given function.

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

Null values still persist if the location of that null value does not exist in other

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0
compare(other: pandas.core.frame.DataFrame, align_axis: Union[str, int] = 1, keep_shape: bool = False, keep_equal: bool = False)pandas.core.frame.DataFrame

Compare to another DataFrame and show the differences.

New in version 1.1.0.

otherDataFrame

Object to compare with.

align_axis{0 or ‘index’, 1 or ‘columns’}, default 1

Determine which axis to align the comparison on.

  • 0, or ‘index’Resulting differences are stacked vertically

    with rows drawn alternately from self and other.

  • 1, or ‘columns’Resulting differences are aligned horizontally

    with columns drawn alternately from self and other.

keep_shapebool, default False

If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.

keep_equalbool, default False

If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.

DataFrame

DataFrame that shows the differences stacked side by side.

The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.

ValueError

When the two DataFrames don’t have identical labels or shape.

Series.compare : Compare with another Series and show differences. DataFrame.equals : Test whether two objects contain the same elements.

Matching NaNs will not appear as a difference.

Can only compare identically-labeled (i.e. same shape, identical row and column labels) DataFrames

>>> df = pd.DataFrame(
...     {
...         "col1": ["a", "a", "b", "b", "a"],
...         "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
...     },
...     columns=["col1", "col2", "col3"],
... )
>>> df
  col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0
>>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2
  col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

Align the differences on columns

>>> df.compare(df2)
  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Stack the differences on rows

>>> df.compare(df2, align_axis=0)
        col1  col3
0 self     a   NaN
  other    c   NaN
2 self   NaN   3.0
  other  NaN   4.0

Keep the equal values

>>> df.compare(df2, keep_equal=True)
  col1       col3
  self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

Keep all original rows and columns

>>> df.compare(df2, keep_shape=True)
  col1       col2       col3
  self other self other self other
0    a     c  NaN   NaN  NaN   NaN
1  NaN   NaN  NaN   NaN  NaN   NaN
2  NaN   NaN  NaN   NaN  3.0   4.0
3  NaN   NaN  NaN   NaN  NaN   NaN
4  NaN   NaN  NaN   NaN  NaN   NaN

Keep all original rows and columns and also all original values

>>> df.compare(df2, keep_shape=True, keep_equal=True)
  col1       col2       col3
  self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0
contains(second_geometry, relation=None)

Indicates if the base geometry contains the comparison geometry.

Paramters:
second_geometry
  • a second geometry

convert_dtypes(infer_objects: bool = True, convert_string: bool = True, convert_integer: bool = True, convert_boolean: bool = True, convert_floating: bool = True)FrameOrSeries

Convert columns to best possible dtypes using dtypes supporting pd.NA.

New in version 1.0.0.

infer_objectsbool, default True

Whether object dtypes should be converted to the best possible types.

convert_stringbool, default True

Whether object dtypes should be converted to StringDtype().

convert_integerbool, default True

Whether, if possible, conversion can be done to integer extension types.

convert_booleanbool, defaults True

Whether object dtypes should be converted to BooleanDtypes().

convert_floatingbool, defaults True

Whether, if possible, conversion can be done to floating extension types. If convert_integer is also True, preference will be give to integer dtypes if the floats can be faithfully casted to integers.

New in version 1.2.0.

Series or DataFrame

Copy of input object with new dtype.

infer_objects : Infer dtypes of objects. to_datetime : Convert argument to datetime. to_timedelta : Convert argument to timedelta. to_numeric : Convert argument to a numeric type.

By default, convert_dtypes will attempt to convert a Series (or each Series in a DataFrame) to dtypes that support pd.NA. By using the options convert_string, convert_integer, convert_boolean and convert_boolean, it is possible to turn off individual conversions to StringDtype, the integer extension types, BooleanDtype or floating extension types, respectively.

For object-dtyped columns, if infer_objects is True, use the inference rules as during normal Series/DataFrame construction. Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer or floating extension type, otherwise leave as object.

If the dtype is integer, convert to an appropriate integer extension type.

If the dtype is numeric, and consists of all integers, convert to an appropriate integer extension type. Otherwise, convert to an appropriate floating extension type.

Changed in version 1.2: Starting with pandas 1.2, this method also converts float columns to the nullable floating extension type.

In the future, as new dtypes are added that support pd.NA, the results of this method will change to support those new dtypes.

>>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

>>> df
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0
>>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

>>> dfn = df.convert_dtypes()
>>> dfn
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0
>>> dfn.dtypes
a      Int32
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

>>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: object

Obtain a Series with dtype StringDtype.

>>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string
convex_hull()

Constructs the geometry that is the minimal bounding polygon such that all outer angles are convex.

coordinates()

returns the point coordinates of the geometry as a np.array object

copy(deep=True)

Make a copy of this SpatialDataFrame object Parameters:

Deep

boolean, default True Make a deep copy, i.e. also copy data

Returns:
copy

of SpatialDataFrame

corr(method='pearson', min_periods=1)pandas.core.frame.DataFrame

Compute pairwise correlation of columns, excluding NA/null values.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

  • pearson : standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • callable: callable with input two 1d ndarrays

    and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

    New in version 0.24.0.

min_periodsint, optional

Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.

DataFrame

Correlation matrix.

DataFrame.corrwithCompute pairwise correlation with another

DataFrame or Series.

Series.corr : Compute the correlation between two Series.

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0
corrwith(other, axis=0, drop=False, method='pearson')pandas.core.series.Series

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

otherDataFrame, Series

Object with which to compute correlations.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to use. 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise.

dropbool, default False

Drop missing indices from result.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

  • pearson : standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • callable: callable with input two 1d ndarrays

    and returning a float.

New in version 0.24.0.

Series

Pairwise correlations.

DataFrame.corr : Compute pairwise correlation of columns.

count(axis=0, level=None, numeric_only=False)

Count non-NA cells for each column or row.

The values None, NaN, NaT, and optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.

levelint or str, optional

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name.

numeric_onlybool, default False

Include only float, int or boolean data.

Series or DataFrame

For each column/row the number of non-NA/null entries. If level is specified returns a DataFrame.

Series.count: Number of non-NA elements in a Series. DataFrame.value_counts: Count unique combinations of columns. DataFrame.shape: Number of DataFrame rows and columns (including NA

elements).

DataFrame.isna: Boolean same-sized DataFrame showing places of NA

elements.

Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2   Lewis  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

Counts for each row:

>>> df.count(axis='columns')
0    3
1    2
2    3
3    3
4    3
dtype: int64

Counts for one level of a MultiIndex:

>>> df.set_index(["Person", "Single"]).count(level="Person")
        Age
Person
John      2
Lewis     1
Myla      1
cov(min_periods: Optional[int] = None, ddof: Optional[int] = 1)pandas.core.frame.DataFrame

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

min_periodsint, optional

Minimum number of observations required per pair of columns to have a valid result.

ddofint, default 1

Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

New in version 1.1.0.

DataFrame

The covariance matrix of the series of the DataFrame.

Series.cov : Compute covariance with another Series. core.window.ExponentialMovingWindow.cov: Exponential weighted sample covariance. core.window.Expanding.cov : Expanding sample covariance. core.window.Rolling.cov : Rolling sample covariance.

Returns the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-ddof.

For DataFrames that have Series that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

>>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667
>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(1000, 5),
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(20, 3),
...                   columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202
crosses(second_geometry)

Indicates if the two geometries intersect in a geometry of a lesser shape type.

Paramters:
second_geometry
  • a second geometry

cummax(axis=None, skipna=True, *args, **kwargs)

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The index or the name of the axis. 0 is equivalent to None or ‘index’.

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Series or DataFrame

Return cumulative maximum of Series or DataFrame.

core.window.Expanding.maxSimilar functionality

but ignores NaN values.

DataFrame.maxReturn the maximum over

DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row, use axis=1

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0
cummin(axis=None, skipna=True, *args, **kwargs)

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The index or the name of the axis. 0 is equivalent to None or ‘index’.

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Series or DataFrame

Return cumulative minimum of Series or DataFrame.

core.window.Expanding.minSimilar functionality

but ignores NaN values.

DataFrame.minReturn the minimum over

DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row, use axis=1

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0
cumprod(axis=None, skipna=True, *args, **kwargs)

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The index or the name of the axis. 0 is equivalent to None or ‘index’.

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Series or DataFrame

Return cumulative product of Series or DataFrame.

core.window.Expanding.prodSimilar functionality

but ignores NaN values.

DataFrame.prodReturn the product over

DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumprod(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row, use axis=1

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0
cumsum(axis=None, skipna=True, *args, **kwargs)

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The index or the name of the axis. 0 is equivalent to None or ‘index’.

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Series or DataFrame

Return cumulative sum of Series or DataFrame.

core.window.Expanding.sumSimilar functionality

but ignores NaN values.

DataFrame.sumReturn the sum over

DataFrame axis.

DataFrame.cummax : Return cumulative maximum over DataFrame axis. DataFrame.cummin : Return cumulative minimum over DataFrame axis. DataFrame.cumsum : Return cumulative sum over DataFrame axis. DataFrame.cumprod : Return cumulative product over DataFrame axis.

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row, use axis=1

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0
cut(cutter)

Splits this geometry into a part left of the cutting polyline, and a part right of it.

Parameters:
cutter
  • The cutting polyline geometry.

densify(method, distance, deviation)

Creates a new geometry with added vertices

Parameters:
method
  • The type of densification, DISTANCE, ANGLE, or GEODESIC

distance
  • The maximum distance between vertices. The actual

distance between vertices will usually be less than the maximum distance as new vertices will be evenly distributed along the original segment. If using a type of DISTANCE or ANGLE, the distance is measured in the units of the geometry’s spatial reference. If using a type of GEODESIC, the distance is measured in meters.

deviation
  • Densify uses straight lines to approximate curves.

You use deviation to control the accuracy of this approximation. The deviation is the maximum distance between the new segment and the original curve. The smaller its value, the more segments will be required to approximate the curve.

describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)FrameOrSeries

Generate descriptive statistics.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

percentileslist-like of numbers, optional

The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.

include‘all’, list-like of dtypes or None (default), optional

A white list of data types to include in the result. Ignored for Series. Here are the options:

  • ‘all’ : All columns of the input will be included in the output.

  • A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To select pandas categorical columns, use 'category'

  • None (default) : The result will include all numeric columns.

excludelist-like of dtypes or None (default), optional,

A black list of data types to omit from the result. Ignored for Series. Here are the options:

  • A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To exclude pandas categorical columns, use 'category'

  • None (default) : The result will exclude nothing.

datetime_is_numericbool, default False

Whether to treat datetime dtypes as numeric. This affects statistics calculated for the column. For DataFrame input, this also controls whether datetime columns are included by default.

New in version 1.1.0.

Series or DataFrame

Summary statistics of the Series or Dataframe provided.

DataFrame.count: Count number of non-NA/null observations. DataFrame.max: Maximum of the values in the object. DataFrame.min: Minimum of the values in the object. DataFrame.mean: Mean of the values. DataFrame.std: Standard deviation of the observations. DataFrame.select_dtypes: Subset of a DataFrame including/excluding

columns based on their dtype.

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

Describing a numeric Series.

>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

Describing a categorical Series.

>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count     4
unique    3
top       a
freq      2
dtype: object

Describing a timestamp Series.

>>> s = pd.Series([
...   np.datetime64("2000-01-01"),
...   np.datetime64("2010-01-01"),
...   np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)
count                      3
mean     2006-09-01 08:00:00
min      2000-01-01 00:00:00
25%      2004-12-31 12:00:00
50%      2010-01-01 00:00:00
75%      2010-01-01 00:00:00
max      2010-01-01 00:00:00
dtype: object

Describing a DataFrame. By default only numeric fields are returned.

>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
...                    'numeric': [1, 2, 3],
...                    'object': ['a', 'b', 'c']
...                   })
>>> df.describe()
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all')  
       categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              f      NaN      a
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN

Describing a column from a DataFrame by accessing it as an attribute.

>>> df.numeric.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number])
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[object])  
       object
count       3
unique      3
top         a
freq        1

Including only categorical columns from a DataFrame description.

>>> df.describe(include=['category'])
       categorical
count            3
unique           3
top              d
freq             1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number])  
       categorical object
count            3      3
unique           3      3
top              f      a
freq             1      1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[object])  
       categorical  numeric
count            3      3.0
unique           3      NaN
top              f      NaN
freq             1      NaN
mean           NaN      2.0
std            NaN      1.0
min            NaN      1.0
25%            NaN      1.5
50%            NaN      2.0
75%            NaN      2.5
max            NaN      3.0
diff(periods: int = 1, axis: Union[str, int] = 0)pandas.core.frame.DataFrame

First discrete difference of element.

Calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row).

periodsint, default 1

Periods to shift for calculating difference, accepts negative values.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Take difference over rows (0) or columns (1).

Dataframe

First differences of the Series.

Dataframe.pct_change: Percent change over given number of periods. Dataframe.shift: Shift index by desired number of periods with an

optional time freq.

Series.diff: First discrete difference of object.

For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in Dataframe, however dtype of the result is always float64.

Difference with previous row

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]})
>>> df
   a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36
>>> df.diff()
     a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

Difference with previous column

>>> df.diff(axis=1)
    a  b   c
0 NaN  0   0
1 NaN -1   3
2 NaN -1   7
3 NaN -1  13
4 NaN  0  20
5 NaN  2  28

Difference with 3rd previous row

>>> df.diff(periods=3)
     a    b     c
0  NaN  NaN   NaN
1  NaN  NaN   NaN
2  NaN  NaN   NaN
3  3.0  2.0  15.0
4  3.0  4.0  21.0
5  3.0  6.0  27.0

Difference with following row

>>> df.diff(periods=-1)
     a    b     c
0 -1.0  0.0  -3.0
1 -1.0 -1.0  -5.0
2 -1.0 -1.0  -7.0
3 -1.0 -2.0  -9.0
4 -1.0 -3.0 -11.0
5  NaN  NaN   NaN

Overflow in input dtype

>>> df = pd.DataFrame({'a': [1, 0]}, dtype=np.uint8)
>>> df.diff()
       a
0    NaN
1  255.0
difference(second_geometry)

Constructs the geometry that is composed only of the region unique to the base geometry but not part of the other geometry. The following illustration shows the results when the red polygon is the source geometry.

Paramters:
second_geometry
  • a second geometry

disjoint(second_geometry)

Indicates if the base and comparison geometries share no points in common.

Paramters:
second_geometry
  • a second geometry

distance_to(second_geometry)

Returns the minimum distance between two geometries. If the geometries intersect, the minimum distance is 0. Both geometries must have the same projection.

Paramters:
second_geometry
  • a second geometry

div(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
divide(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
dot(other)

Compute the matrix multiplication between the DataFrame and other.

This method computes the matrix product between the DataFrame and the values of an other Series, DataFrame or a numpy array.

It can also be called using self @ other in Python >= 3.5.

otherSeries, DataFrame or array-like

The other object to compute the matrix product with.

Series or DataFrame

If other is a Series, return the matrix product between self and other as a Series. If other is a DataFrame or a numpy.array, return the matrix product of self and other in a DataFrame of a np.array.

Series.dot: Similar method for Series.

The dimensions of DataFrame and other must be compatible in order to compute the matrix multiplication. In addition, the column names of DataFrame and the index of other must contain the same values, as they will be aligned prior to the multiplication.

The dot method for Series computes the inner product, instead of the matrix product here.

Here we multiply a DataFrame with a Series.

>>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> s = pd.Series([1, 1, 2, 1])
>>> df.dot(s)
0    -4
1     5
dtype: int64

Here we multiply a DataFrame with another DataFrame.

>>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(other)
    0   1
0   1   4
1   2   2

Note that the dot method give the same result as @

>>> df @ other
    0   1
0   1   4
1   2   2

The dot method works also if other is an np.array.

>>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(arr)
    0   1
0   1   4
1   2   2

Note how shuffling of the objects does not change the result.

>>> s2 = s.reindex([1, 0, 2, 3])
>>> df.dot(s2)
0    -4
1     5
dtype: int64
drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.

labelssingle label or list-like

Index or column labels to drop.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

indexsingle label or list-like

Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

columnssingle label or list-like

Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

levelint or level name, optional

For MultiIndex, level from which the labels will be removed.

inplacebool, default False

If False, return a copy. Otherwise, do operation inplace and return None.

errors{‘ignore’, ‘raise’}, default ‘raise’

If ‘ignore’, suppress error and only existing labels are dropped.

DataFrame or None

DataFrame without the removed index or column labels or None if inplace=True.

KeyError

If any of the labels is not found in the selected axis.

DataFrame.loc : Label-location based indexer for selection by label. DataFrame.dropna : Return DataFrame with labels on given axis omitted

where (all or any) data are missing.

DataFrame.drop_duplicatesReturn DataFrame with duplicate rows

removed, optionally only considering certain columns.

Series.drop : Return Series with specified index labels removed.

>>> df = pd.DataFrame(np.arange(12).reshape(3, 4),
...                   columns=['A', 'B', 'C', 'D'])
>>> df
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Drop columns

>>> df.drop(['B', 'C'], axis=1)
   A   D
0  0   3
1  4   7
2  8  11
>>> df.drop(columns=['B', 'C'])
   A   D
0  0   3
1  4   7
2  8  11

Drop a row by index

>>> df.drop([0, 1])
   A  B   C   D
2  8  9  10  11

Drop columns and/or rows of MultiIndex DataFrame

>>> midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
...                   data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                         [250, 150], [1.5, 0.8], [320, 250],
...                         [1, 0.8], [0.3, 0.2]])
>>> df
                big     small
lama    speed   45.0    30.0
        weight  200.0   100.0
        length  1.5     1.0
cow     speed   30.0    20.0
        weight  250.0   150.0
        length  1.5     0.8
falcon  speed   320.0   250.0
        weight  1.0     0.8
        length  0.3     0.2
>>> df.drop(index='cow', columns='small')
                big
lama    speed   45.0
        weight  200.0
        length  1.5
falcon  speed   320.0
        weight  1.0
        length  0.3
>>> df.drop(index='length', level=1)
                big     small
lama    speed   45.0    30.0
        weight  200.0   100.0
cow     speed   30.0    20.0
        weight  250.0   150.0
falcon  speed   320.0   250.0
        weight  1.0     0.8
drop_duplicates(subset: Optional[Union[Hashable, Sequence[Hashable]]] = None, keep: Union[str, bool] = 'first', inplace: bool = False, ignore_index: bool = False)Optional[pandas.core.frame.DataFrame]

Return DataFrame with duplicate rows removed.

Considering certain columns is optional. Indexes, including time indexes are ignored.

subsetcolumn label or sequence of labels, optional

Only consider certain columns for identifying duplicates, by default use all of the columns.

keep{‘first’, ‘last’, False}, default ‘first’

Determines which duplicates (if any) to keep. - first : Drop duplicates except for the first occurrence. - last : Drop duplicates except for the last occurrence. - False : Drop all duplicates.

inplacebool, default False

Whether to drop duplicates in place or to return a copy.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.

DataFrame or None

DataFrame with duplicates removed or None if inplace=True.

DataFrame.value_counts: Count unique combinations of columns.

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

>>> df.drop_duplicates()
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column(s), use subset.

>>> df.drop_duplicates(subset=['brand'])
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
    brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0
droplevel(level, axis=0)FrameOrSeries

Return DataFrame with requested index / column level(s) removed.

New in version 0.24.0.

levelint, str, or list-like

If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the level(s) is removed:

  • 0 or ‘index’: remove level(s) in column.

  • 1 or ‘columns’: remove level(s) in row.

DataFrame

DataFrame with requested index / column level(s) removed.

>>> df = pd.DataFrame([
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
>>> df.columns = pd.MultiIndex.from_tuples([
...     ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
>>> df
level_1   c   d
level_2   e   f
a b
1 2      3   4
5 6      7   8
9 10    11  12
>>> df.droplevel('a')
level_1   c   d
level_2   e   f
b
2        3   4
6        7   8
10      11  12
>>> df.droplevel('level_2', axis=1)
level_1   c   d
a b
1 2      3   4
5 6      7   8
9 10    11  12
dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

  • 0, or ‘index’ : Drop rows which contain missing values.

  • 1, or ‘columns’ : Drop columns which contain missing value.

Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

how{‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

  • ‘any’ : If any NA values are present, drop that row or column.

  • ‘all’ : If all values are NA, drop that row or column.

threshint, optional

Require that many non-NA values.

subsetarray-like, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplacebool, default False

If True, do operation inplace and return None.

DataFrame or None

DataFrame with NA entries dropped from it or None if inplace=True.

DataFrame.isna: Indicate missing values. DataFrame.notna : Indicate existing (non-missing) values. DataFrame.fillna : Replace missing values. Series.dropna : Drop missing values. Index.dropna : Drop missing indices.

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep the DataFrame with valid entries in the same variable.

>>> df.dropna(inplace=True)
>>> df
     name        toy       born
1  Batman  Batmobile 1940-04-25
property dtypes

Return the dtypes in the DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more.

pandas.Series

The data type of each column.

>>> df = pd.DataFrame({'float': [1.0],
...                    'int': [1],
...                    'datetime': [pd.Timestamp('20180310')],
...                    'string': ['foo']})
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object
duplicated(subset: Optional[Union[Hashable, Sequence[Hashable]]] = None, keep: Union[str, bool] = 'first')pandas.core.series.Series

Return boolean Series denoting duplicate rows.

Considering certain columns is optional.

subsetcolumn label or sequence of labels, optional

Only consider certain columns for identifying duplicates, by default use all of the columns.

keep{‘first’, ‘last’, False}, default ‘first’

Determines which duplicates (if any) to mark.

  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Series

Boolean series for each duplicated rows.

Index.duplicated : Equivalent method on index. Series.duplicated : Equivalent method on Series. Series.drop_duplicates : Remove duplicate values from Series. DataFrame.drop_duplicates : Remove duplicate values from DataFrame.

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, for each set of duplicated values, the first occurrence is set on False and all others on True.

>>> df.duplicated()
0    False
1     True
2    False
3    False
4    False
dtype: bool

By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.

>>> df.duplicated(keep='last')
0     True
1    False
2    False
3    False
4    False
dtype: bool

By setting keep on False, all duplicates are True.

>>> df.duplicated(keep=False)
0     True
1     True
2    False
3    False
4    False
dtype: bool

To find duplicates on specific column(s), use subset.

>>> df.duplicated(subset=['brand'])
0    False
1     True
2    False
3     True
4     True
dtype: bool
property empty

Indicator whether DataFrame is empty.

True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.

bool

If DataFrame is empty, return True, if not return False.

Series.dropna : Return series without null values. DataFrame.dropna : Return DataFrame with labels on given axis omitted

where (all or any) data are missing.

If DataFrame contains only NaNs, it is still not considered empty. See the example below.

An example of an actual empty DataFrame. Notice the index is empty:

>>> df_empty = pd.DataFrame({'A' : []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:

>>> df = pd.DataFrame({'A' : [np.nan]})
>>> df
    A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True
eq(other, axis='columns', level=None)

Get Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

DataFrame of bool

Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.

DataFrame.ltCompare DataFrames for strictly less than

inequality elementwise.

DataFrame.geCompare DataFrames for greater than inequality

or equality elementwise.

DataFrame.gtCompare DataFrames for strictly greater than

inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
equals(second_geometry)

Indicates if the base and comparison geometries are of the same shape type and define the same set of points in the plane. This is a 2D comparison only; M and Z values are ignored. Paramters:

second_geometry
  • a second geometry

erase(other, inplace=False)

Erases

Argument

Description

other

Required Geometry. A geometry object to erase from other geometries.

inplace

Optional boolean. Default False. Modify the SpatialDataFrame in place (do not create a new object)

Returns

SpatialDataFrame

eval(expr, inplace=False, **kwargs)

Evaluate a string describing operations on DataFrame columns.

Operates on columns only, not specific rows or elements. This allows eval to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.

exprstr

The expression string to evaluate.

inplacebool, default False

If the expression contains an assignment, whether to perform the operation inplace and mutate the existing DataFrame. Otherwise, a new DataFrame is returned.

**kwargs

See the documentation for eval() for complete details on the keyword arguments accepted by query().

ndarray, scalar, pandas object, or None

The result of the evaluation or None if inplace=True.

DataFrame.queryEvaluates a boolean expression to query the columns

of a frame.

DataFrame.assignCan evaluate an expression or function to create new

values for a column.

evalEvaluate a Python expression as a string using various

backends.

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
>>> df.eval('A + B')
0    11
1    10
2     9
3     8
4     7
dtype: int64

Assignment is allowed though by default the original DataFrame is not modified.

>>> df.eval('C = A + B')
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2

Use inplace=True to modify the original DataFrame.

>>> df.eval('C = A + B', inplace=True)
>>> df
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7

Multiple columns can be assigned to using multi-line expressions:

>>> df.eval(
...     '''
... C = A + B
... D = A - B
... '''
... )
   A   B   C  D
0  1  10  11 -9
1  2   8  10 -6
2  3   6   9 -3
3  4   4   8  0
4  5   2   7  3
ewm(com: Optional[float] = None, span: Optional[float] = None, halflife: Optional[Union[float, Timedelta, datetime.timedelta, numpy.timedelta64, int, numpy.int64, str]] = None, alpha: Optional[float] = None, min_periods: int = 0, adjust: bool = True, ignore_na: bool = False, axis: Union[str, int] = 0, times: Optional[Union[str, numpy.ndarray, FrameOrSeries]] = None)pandas.core.window.ewm.ExponentialMovingWindow

Provide exponential weighted (EW) functions.

Available EW functions: mean(), var(), std(), corr(), cov().

Exactly one parameter: com, span, halflife, or alpha must be provided.

comfloat, optional

Specify decay in terms of center of mass, \(\alpha = 1 / (1 + com)\), for \(com \geq 0\).

spanfloat, optional

Specify decay in terms of span, \(\alpha = 2 / (span + 1)\), for \(span \geq 1\).

halflifefloat, str, timedelta, optional

Specify decay in terms of half-life, \(\alpha = 1 - \exp\left(-\ln(2) / halflife\right)\), for \(halflife > 0\).

If times is specified, the time unit (str or timedelta) over which an observation decays to half its value. Only applicable to mean() and halflife value will not apply to the other functions.

New in version 1.1.0.

alphafloat, optional

Specify smoothing factor \(\alpha\) directly, \(0 < \alpha \leq 1\).

min_periodsint, default 0

Minimum number of observations in window required to have a value (otherwise result is NA).

adjustbool, default True

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average).

  • When adjust=True (default), the EW function is calculated using weights \(w_i = (1 - \alpha)^i\). For example, the EW moving average of the series [\(x_0, x_1, ..., x_t\)] would be:

\[y_t = \frac{x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ... + (1 - \alpha)^t x_0}{1 + (1 - \alpha) + (1 - \alpha)^2 + ... + (1 - \alpha)^t}\]
  • When adjust=False, the exponentially weighted function is calculated recursively:

\[\begin{split}\begin{split} y_0 &= x_0\\ y_t &= (1 - \alpha) y_{t-1} + \alpha x_t, \end{split}\end{split}\]
ignore_nabool, default False

Ignore missing values when calculating weights; specify True to reproduce pre-0.15.0 behavior.

  • When ignore_na=False (default), weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) if adjust=True, and \((1-\alpha)^2\) and \(\alpha\) if adjust=False.

  • When ignore_na=True (reproducing pre-0.15.0 behavior), weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) if adjust=True, and \(1-\alpha\) and \(\alpha\) if adjust=False.

axis{0, 1}, default 0

The axis to use. The value 0 identifies the rows, and 1 identifies the columns.

times : str, np.ndarray, Series, default None

New in version 1.1.0.

Times corresponding to the observations. Must be monotonically increasing and datetime64[ns] dtype.

If str, the name of the column in the DataFrame representing the times.

If 1-D array like, a sequence with the same shape as the observations.

Only applicable to mean().

DataFrame

A Window sub-classed for the particular operation.

rolling : Provides rolling window calculations. expanding : Provides expanding transformations.

More details can be found at: Exponentially weighted windows.

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
>>> df.ewm(com=0.5).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213

Specifying times with a timedelta halflife when computing mean.

>>> times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
>>> df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
          B
0  0.000000
1  0.585786
2  1.523889
3  1.523889
4  3.233686
expanding(min_periods: int = 1, center: Optional[bool] = None, axis: Union[str, int] = 0)pandas.core.window.expanding.Expanding

Provide expanding transformations.

min_periodsint, default 1

Minimum number of observations in window required to have a value (otherwise result is NA).

centerbool, default False

Set the labels at the center of the window.

axis : int or str, default 0

a Window sub-classed for the particular operation

rolling : Provides rolling window calculations. ewm : Provides exponential weighted functions.

By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

>>> df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
>>> df.expanding(2).sum()
     B
0  NaN
1  1.0
2  3.0
3  3.0
4  7.0
explode(column: Union[str, Tuple], ignore_index: bool = False)pandas.core.frame.DataFrame

Transform each element of a list-like to a row, replicating index values.

New in version 0.25.0.

columnstr or tuple

Column to explode.

ignore_indexbool, default False

If True, the resulting index will be labeled 0, 1, …, n - 1.

New in version 1.1.0.

DataFrame

Exploded lists to rows of the subset columns; index will be duplicated for these rows.

ValueError :

if columns of the frame are not unique.

DataFrame.unstackPivot a level of the (necessarily hierarchical)

index labels.

DataFrame.melt : Unpivot a DataFrame from wide format to long format. Series.explode : Explode a DataFrame from list-like columns to long format.

This routine will explode list-likes including lists, tuples, sets, Series, and np.ndarray. The result dtype of the subset rows will be object. Scalars will be returned unchanged, and empty list-likes will result in a np.nan for that row. In addition, the ordering of rows in the output will be non-deterministic when exploding sets.

>>> df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
>>> df
           A  B
0  [1, 2, 3]  1
1        foo  1
2         []  1
3     [3, 4]  1
>>> df.explode('A')
     A  B
0    1  1
0    2  1
0    3  1
1  foo  1
2  NaN  1
3    3  1
3    4  1
property extent

the extent of the geometry

ffill(axis=None, inplace: bool = False, limit=None, downcast=None)Optional[FrameOrSeries]

Synonym for DataFrame.fillna() with method='ffill'.

Series/DataFrame or None

Object with missing values filled or None if inplace=True.

fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)Optional[pandas.core.frame.DataFrame]

Fill NA/NaN values using the specified method.

valuescalar, dict, Series, or DataFrame

Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.

axis{0 or ‘index’, 1 or ‘columns’}

Axis along which to fill missing values.

inplacebool, default False

If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

limitint, default None

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

downcastdict, default is None

A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

DataFrame or None

Object with missing values filled or None if inplace=True.

interpolate : Fill NaN values using interpolation. reindex : Conform object to new index. asfreq : Convert TimeSeries to specified frequency.

>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
...                    [3, 4, np.nan, 1],
...                    [np.nan, np.nan, np.nan, 5],
...                    [np.nan, 3, np.nan, 4]],
...                   columns=list('ABCD'))
>>> df
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

Replace all NaN elements with 0s.

>>> df.fillna(0)
    A   B   C   D
0   0.0 2.0 0.0 0
1   3.0 4.0 0.0 1
2   0.0 0.0 0.0 5
3   0.0 3.0 0.0 4

We can also propagate non-null values forward or backward.

>>> df.fillna(method='ffill')
    A   B   C   D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   3.0 4.0 NaN 5
3   3.0 3.0 NaN 4

Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.

>>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
>>> df.fillna(value=values)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 2.0 1
2   0.0 1.0 2.0 5
3   0.0 3.0 2.0 4

Only replace the first NaN element.

>>> df.fillna(value=values, limit=1)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 NaN 1
2   NaN 1.0 NaN 5
3   NaN 3.0 NaN 4
filter(items=None, like: Optional[str] = None, regex: Optional[str] = None, axis=None)FrameOrSeries

Subset the dataframe rows or columns according to the specified index labels.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

itemslist-like

Keep labels from axis which are in items.

likestr

Keep labels from axis for which “like in label == True”.

regexstr (regular expression)

Keep labels from axis for which re.search(regex, label) == True.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, ‘index’ for Series, ‘columns’ for DataFrame.

same type as input object

DataFrame.locAccess a group of rows and columns

by label(s) or a boolean array.

The items, like, and regex parameters are enforced to be mutually exclusive.

axis defaults to the info axis that is used when indexing with [].

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> df
        one  two  three
mouse     1    2      3
rabbit    4    5      6
>>> # select columns by name
>>> df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
         one  three
mouse     1      3
rabbit    4      6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6
first(offset)FrameOrSeries

Select initial periods of time series data based on a date offset.

When having a DataFrame with dates as index, this function can select the first few rows based on a date offset.

offsetstr, DateOffset or dateutil.relativedelta

The offset length of the data that will be selected. For instance, ‘1M’ will display all the rows having their index within the first month.

Series or DataFrame

A subset of the caller.

TypeError

If the index is not a DatetimeIndex

last : Select final periods of time series based on a date offset. at_time : Select values at a particular time of the day. between_time : Select values between particular times of the day.

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the first 3 days:

>>> ts.first('3D')
            A
2018-04-09  1
2018-04-11  2

Notice the data for 3 first calendar days were returned, not the first 3 days observed in the dataset, and therefore data for 2018-04-13 was not returned.

property first_point

The first coordinate point of the geometry.

first_valid_index()

Return index for first non-NA/null value.

scalar : type of index

If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.

property flags

Get the properties associated with this pandas object.

The available flags are

  • Flags.allows_duplicate_labels

Flags : Flags that apply to pandas objects. DataFrame.attrs : Global metadata applying to this dataset.

“Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>

Flags can be get or set using .

>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False

Or by slicing with a key

>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True
floordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
static from_df(df, address_column='address', geocoder=None)

Returns a SpatialDataFrame from a dataframe with an address column.

Argument

Description

df

Required Pandas DataFrame. Source dataset

address_column

Optional String. The default is “address”. This is the name of a column in the specified dataframe that contains addresses (as strings). The addresses are batch geocoded using the GIS’s first configured geocoder and their locations used as the geometry of the spatial dataframe. Ignored if the ‘geometry’ parameter is also specified.

geocoder

Optional Geocoder. The geocoder to be used. If not specified, the active GIS’s first geocoder is used.

Returns

SpatialDataFrame

NOTE: Credits will be consumed for batch_geocoding, from the GIS to which the geocoder belongs.

classmethod from_dict(data, orient='columns', dtype=None, columns=None)pandas.core.frame.DataFrame

Construct DataFrame from dict of array-like or dicts.

Creates DataFrame object from dictionary by columns or by index allowing dtype specification.

datadict

Of the form {field : array-like} or {field : dict}.

orient{‘columns’, ‘index’}, default ‘columns’

The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

dtypedtype, default None

Data type to force, otherwise infer.

columnslist, default None

Column labels to use when orient='index'. Raises a ValueError if used with orient='columns'.

DataFrame

DataFrame.from_recordsDataFrame from structured ndarray, sequence

of tuples or dicts, or DataFrame.

DataFrame : DataFrame object creation using constructor.

By default the keys of the dict become the DataFrame columns:

>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Specify orient='index' to create the DataFrame using dictionary keys as rows:

>>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data, orient='index')
       0  1  2  3
row_1  3  2  1  0
row_2  a  b  c  d

When using the ‘index’ orientation, the column names can be specified manually:

>>> pd.DataFrame.from_dict(data, orient='index',
...                        columns=['A', 'B', 'C', 'D'])
       A  B  C  D
row_1  3  2  1  0
row_2  a  b  c  d
static from_featureclass(filename, **kwargs)

Returns a SpatialDataFrame from a feature class.

Argument

Description

filename

Required string. The full path to the feature class

sql_clause

Optional string. The sql clause to parse data down

where_clause

Optional string. A where statement

sr

Optional SpatialReference. A spatial reference object

Returns

SpatialDataFrame

static from_hdf(path_or_buf, key=None, **kwargs)

read from the store, close it if we opened it

Retrieve pandas object stored in file, optionally based on where criteria

path_or_bufpath (string), buffer, or path object (pathlib.Path or

py._path.local.LocalPath) to read from

New in version 0.19.0: support for pathlib, py.path.

keygroup identifier in the store. Can be omitted if the HDF file

contains a single pandas object.

where : list of Term (or convertable) objects, optional start : optional, integer (defaults to None), row number to start

selection

stopoptional, integer (defaults to None), row number to stop

selection

columnsoptional, a list of columns that if not None, will limit the

return columns

iterator : optional, boolean, return an iterator, default False chunksize : optional, nrows to include in iteration, return an iterator

The selected object

static from_layer(layer, **kwargs)

Returns a SpatialDataFrame/Pandas’ Dataframe from a FeatureLayer or Table object.

Arguments

Description

layer

required FeatureLayer/Table. This is the service endpoint object.

Returns

SpatialDataFrame for feature layers with geometry and Panda’s Dataframe for tables

classmethod from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)pandas.core.frame.DataFrame

Convert structured or record ndarray to DataFrame.

Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.

datastructured ndarray, sequence of tuples or dicts, or DataFrame

Structured input data.

indexstr, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use.

excludesequence, default None

Columns or fields to exclude.

columnssequence, default None

Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).

coerce_floatbool, default False

Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

nrowsint, default None

Number of rows to read if data is an iterator.

DataFrame

DataFrame.from_dict : DataFrame from dict of array-like or dicts. DataFrame : DataFrame object creation using constructor.

Data can be provided as a structured ndarray:

>>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],
...                 dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of dicts:

>>> data = [{'col_1': 3, 'col_2': 'a'},
...         {'col_1': 2, 'col_2': 'b'},
...         {'col_1': 1, 'col_2': 'c'},
...         {'col_1': 0, 'col_2': 'd'}]
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of tuples with corresponding columns:

>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
>>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2'])
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d
static from_xy(df, x_column, y_column, sr=4326)

Converts a Pandas DataFrame into a Spatial DataFrame by providing the X/Y columns.

Argument

Description

df

Required Pandas DataFrame. Source dataset

x_column

Required string. The name of the X-coordinate series

y_column

Required string. The name of the Y-coordinate series

sr

Optional int. The wkid number of the spatial reference.

Returns

SpatialDataFrame

ge(other, axis='columns', level=None)

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

DataFrame of bool

Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.

DataFrame.ltCompare DataFrames for strictly less than

inequality elementwise.

DataFrame.geCompare DataFrames for greater than inequality

or equality elementwise.

DataFrame.gtCompare DataFrames for strictly greater than

inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
generalize(max_offset)

Creates a new simplified geometry using a specified maximum offset tolerance.

Parameters:
max_offset
  • The maximum offset tolerance.

property geoextent

returns the extent of the spatial dataframe

property geometry

Get/Set the geometry data for SpatialDataFrame

property geometry_type

The geometry type: polygon, polyline, point, multipoint, multipatch, dimension, or annotation

get(key, default=None)

Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

key : object

value : same type as items contained in object

get_area(method, units=None)

Returns the area of the feature using a measurement type.

Parameters:
method
  • PLANAR measurements reflect the projection of

geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

units
  • Areal unit of measure keywords: ACRES | ARES | HECTARES

SQUARECENTIMETERS | SQUAREDECIMETERS | SQUAREINCHES | SQUAREFEET
SQUAREKILOMETERS | SQUAREMETERS | SQUAREMILES |

SQUAREMILLIMETERS | SQUAREYARDS

get_length(method, units)

Returns the length of the feature using a measurement type.

Parameters:
method
  • PLANAR measurements reflect the projection of

geographic data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

units
  • Linear unit of measure keywords: CENTIMETERS |

DECIMETERS | FEET | INCHES | KILOMETERS | METERS | MILES | MILLIMETERS | NAUTICALMILES | YARDS

get_part(index=None)

Returns an array of point objects for a particular part of geometry or an array containing a number of arrays, one for each part.

Parameters:
index
  • The index position of the geometry.

groupby(by=None, axis=0, level=None, as_index: bool = True, sort: bool = True, group_keys: bool = True, squeeze: bool = <object object>, observed: bool = False, dropna: bool = True)DataFrameGroupBy

Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

bymapping, function, label, or list of labels

Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Split along rows (0) or columns (1).

levelint, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels.

as_indexbool, default True

For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.

sortbool, default True

Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

group_keysbool, default True

When calling apply, add group keys to index to identify pieces.

squeezebool, default False

Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

Deprecated since version 1.1.0.

observedbool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

dropnabool, default True

If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups

New in version 1.1.0.

DataFrameGroupBy

Returns a groupby object that contains information about the groups.

resampleConvenience method for frequency conversion and resampling

of time series.

See the user guide for more.

>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
...                   index=index)
>>> df
                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level="Type").mean()
         Max Speed
Type
Captive      210.0
Wild         185.0

We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True:

>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
    a   c
b
1.0 2   3
2.0 2   5
>>> df.groupby(by=["b"], dropna=False).sum()
    a   c
b
1.0 2   3
2.0 2   5
NaN 1   4
>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by="a").sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
>>> df.groupby(by="a", dropna=False).sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0
gt(other, axis='columns', level=None)

Get Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

DataFrame of bool

Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.

DataFrame.ltCompare DataFrames for strictly less than

inequality elementwise.

DataFrame.geCompare DataFrames for greater than inequality

or equality elementwise.

DataFrame.gtCompare DataFrames for strictly greater than

inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
head(n: int = 5)FrameOrSeries

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

nint, default 5

Number of rows to select.

same type as caller

The first n rows of the caller object.

DataFrame.tail: Returns the last n rows.

>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first 5 lines

>>> df.head()
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon

For negative values of n

>>> df.head(-3)
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
hist(column: Union[Hashable, None, Sequence[Optional[Hashable]]] = None, by=None, grid: bool = True, xlabelsize: Optional[int] = None, xrot: Optional[float] = None, ylabelsize: Optional[int] = None, yrot: Optional[float] = None, ax=None, sharex: bool = False, sharey: bool = False, figsize: Optional[Tuple[int, int]] = None, layout: Optional[Tuple[int, int]] = None, bins: Union[int, Sequence[int]] = 10, backend: Optional[str] = None, legend: bool = False, **kwargs)

Make a histogram of the DataFrame’s.

A histogram is a representation of the distribution of data. This function calls matplotlib.pyplot.hist(), on each series in the DataFrame, resulting in one histogram per column.

dataDataFrame

The pandas object holding the data.

columnstr or sequence

If passed, will be used to limit data to a subset of columns.

byobject, optional

If passed, then used to form histograms for separate groups.

gridbool, default True

Whether to show axis grid lines.

xlabelsizeint, default None

If specified changes the x-axis label size.

xrotfloat, default None

Rotation of x axis labels. For example, a value of 90 displays the x labels rotated 90 degrees clockwise.

ylabelsizeint, default None

If specified changes the y-axis label size.

yrotfloat, default None

Rotation of y axis labels. For example, a value of 90 displays the y labels rotated 90 degrees clockwise.

axMatplotlib axes object, default None

The axes to plot the histogram on.

sharexbool, default True if ax is None else False

In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in. Note that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure.

shareybool, default False

In case subplots=True, share y axis and set some y axis labels to invisible.

figsizetuple

The size in inches of the figure to create. Uses the value in matplotlib.rcParams by default.

layouttuple, optional

Tuple of (rows, columns) for the layout of the histograms.

binsint or sequence, default 10

Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. In this case, bins is returned unmodified.

backendstr, default None

Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.

New in version 1.0.0.

legendbool, default False

Whether to show the legend.

New in version 1.1.0.

**kwargs

All other plotting keyword arguments to be passed to matplotlib.pyplot.hist().

matplotlib.AxesSubplot or numpy.ndarray of them

matplotlib.pyplot.hist : Plot a histogram using matplotlib.

This example draws a histogram based on the length and width of some animals, displayed in three bins

property hull_rectangle

A space-delimited string of the coordinate pairs of the convex hull rectangle.

property iat

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

IndexError

When integer position is out of bounds.

DataFrame.at : Access a single value for a row/column label pair. DataFrame.loc : Access a group of rows and columns by label(s). DataFrame.iloc : Access a group of rows and columns by integer position(s).

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   columns=['A', 'B', 'C'])
>>> df
    A   B   C
0   0   2   3
1   0   4   1
2  10  20  30

Get value at specified row/column pair

>>> df.iat[1, 2]
1

Set value at specified row/column pair

>>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10

Get value within a series

>>> df.loc[0].iat[1]
2
idxmax(axis=0, skipna=True)pandas.core.series.Series

Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Series

Indexes of maxima along the specified axis.

ValueError
  • If the row/column is empty

Series.idxmax : Return index of the maximum element.

This method is the DataFrame version of ndarray.argmax.

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                    'co2_emissions': [37.2, 19.66, 1712]},
...                    index=['Pork', 'Wheat Products', 'Beef'])
>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the maximum value in each column.

>>> df.idxmax()
consumption     Wheat Products
co2_emissions             Beef
dtype: object

To return the index for the maximum value in each row, use axis="columns".

>>> df.idxmax(axis="columns")
Pork              co2_emissions
Wheat Products     consumption
Beef              co2_emissions
dtype: object
idxmin(axis=0, skipna=True)pandas.core.series.Series

Return index of first occurrence of minimum over requested axis.

NA/null values are excluded.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Series

Indexes of minima along the specified axis.

ValueError
  • If the row/column is empty

Series.idxmin : Return index of the minimum element.

This method is the DataFrame version of ndarray.argmin.

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                    'co2_emissions': [37.2, 19.66, 1712]},
...                    index=['Pork', 'Wheat Products', 'Beef'])
>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the minimum value in each column.

>>> df.idxmin()
consumption                Pork
co2_emissions    Wheat Products
dtype: object

To return the index for the minimum value in each row, use axis="columns".

>>> df.idxmin(axis="columns")
Pork                consumption
Wheat Products    co2_emissions
Beef                consumption
dtype: object
property iloc

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

  • An integer, e.g. 5.

  • A list or array of integers, e.g. [4, 3, 0].

  • A slice object with ints, e.g. 1:7.

  • A boolean array.

  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See more at Selection by Position.

DataFrame.iat : Fast integer location scalar accessor. DataFrame.loc : Purely label-location based indexer for selection by label. Series.iloc : Purely integer-location based indexing for

selection by position.

>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Indexing just the rows

With a scalar integer.

>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a    1
b    2
c    3
d    4
Name: 0, dtype: int64

With a list of integers.

>>> df.iloc[[0]]
   a  b  c  d
0  1  2  3  4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
>>> df.iloc[[0, 1]]
     a    b    c    d
0    1    2    3    4
1  100  200  300  400

With a slice object.

>>> df.iloc[:3]
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

With a boolean mask the same length as the index.

>>> df.iloc[[True, False, True]]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

With a callable, useful in method chains. The x passed to the lambda is the DataFrame being sliced. This selects the rows whose index label even.

>>> df.iloc[lambda x: x.index % 2 == 0]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

Indexing both axes

You can mix the indexer types for the index and columns. Use : to select the entire axis.

With scalar integers.

>>> df.iloc[0, 1]
2

With lists of integers.

>>> df.iloc[[0, 2], [1, 3]]
      b     d
0     2     4
2  2000  4000

With slice objects.

>>> df.iloc[1:3, 0:3]
      a     b     c
1   100   200   300
2  1000  2000  3000

With a boolean array whose length matches the columns.

>>> df.iloc[:, [True, False, True, False]]
      a     c
0     1     3
1   100   300
2  1000  3000

With a callable function that expects the Series or DataFrame.

>>> df.iloc[:, lambda df: [0, 2]]
      a     c
0     1     3
1   100   300
2  1000  3000
index: Index

The index (row labels) of the DataFrame.

infer_objects()FrameOrSeries

Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

converted : same type as input object

to_datetime : Convert argument to datetime. to_timedelta : Convert argument to timedelta. to_numeric : Convert argument to numeric type. convert_dtypes : Convert argument to best possible dtype.

>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
   A
1  1
2  2
3  3
>>> df.dtypes
A    object
dtype: object
>>> df.infer_objects().dtypes
A    int64
dtype: object
info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

Concise summary of a DataFrame.

verbose{None, True, False}, optional

Whether to print the full summary. None follows the display.max_info_columns setting. True or False overrides the display.max_info_columns setting.

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed. None follows the display.max_info_columns setting.

memory_usageboolean/string, default None

Specifies whether total memory usage of the DataFrame elements (including index) should be displayed. None follows the display.memory_usage setting. True or False overrides the display.memory_usage setting. A value of ‘deep’ is equivalent of True, with deep introspection. Memory usage is shown in human-readable units (base-2 representation).

null_countsboolean, default None

Whether to show the non-null counts

  • If None, then only show if the frame is smaller than max_info_rows and max_info_columns.

  • If True, always show counts.

  • If False, never show counts.

insert(loc, column, value, allow_duplicates=False)None

Insert column into DataFrame at specified location.

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

locint

Insertion index. Must verify 0 <= loc <= len(columns).

columnstr, number, or hashable object

Label of the inserted column.

value : int, Series, or array-like allow_duplicates : bool, optional

interpolate(method: str = 'linear', axis: Union[str, int] = 0, limit: Optional[int] = None, inplace: bool = False, limit_direction: Optional[str] = None, limit_area: Optional[str] = None, downcast: Optional[str] = None, **kwargs)Optional[FrameOrSeries]

Fill NaN values using an interpolation method.

Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.

methodstr, default ‘linear’

Interpolation technique to use. One of:

  • ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.

  • ‘time’: Works on daily and higher resolution data to interpolate given length of interval.

  • ‘index’, ‘values’: use the actual numerical values of the index.

  • ‘pad’: Fill in NaNs using existing values.

  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).

  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.

  • ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.

axis{{0 or ‘index’, 1 or ‘columns’, None}}, default None

Axis to interpolate along.

limitint, optional

Maximum number of consecutive NaNs to fill. Must be greater than 0.

inplacebool, default False

Update the data in place if possible.

limit_direction{{‘forward’, ‘backward’, ‘both’}}, Optional

Consecutive NaNs will be filled in this direction.

If limit is specified:
  • If ‘method’ is ‘pad’ or ‘ffill’, ‘limit_direction’ must be ‘forward’.

  • If ‘method’ is ‘backfill’ or ‘bfill’, ‘limit_direction’ must be ‘backwards’.

If ‘limit’ is not specified:
  • If ‘method’ is ‘backfill’ or ‘bfill’, the default is ‘backward’

  • else the default is ‘forward’

Changed in version 1.1.0: raises ValueError if limit_direction is ‘forward’ or ‘both’ and method is ‘backfill’ or ‘bfill’. raises ValueError if limit_direction is ‘backward’ or ‘both’ and method is ‘pad’ or ‘ffill’.

limit_area{{None, ‘inside’, ‘outside’}}, default None

If limit is specified, consecutive NaNs will be filled with this restriction.

  • None: No fill restriction.

  • ‘inside’: Only fill NaNs surrounded by valid values (interpolate).

  • ‘outside’: Only fill NaNs outside valid values (extrapolate).

downcastoptional, ‘infer’ or None, defaults to None

Downcast dtypes if possible.

**kwargsoptional

Keyword arguments to pass on to the interpolating function.

Series or DataFrame or None

Returns the same object type as the caller, interpolated at some or all NaN values or None if inplace=True.

fillna : Fill missing values using different methods. scipy.interpolate.Akima1DInterpolator : Piecewise cubic polynomials

(Akima interpolator).

scipy.interpolate.BPoly.from_derivativesPiecewise polynomial in the

Bernstein basis.

scipy.interpolate.interp1d : Interpolate a 1-D function. scipy.interpolate.KroghInterpolator : Interpolate polynomial (Krogh

interpolator).

scipy.interpolate.PchipInterpolatorPCHIP 1-d monotonic cubic

interpolation.

scipy.interpolate.CubicSpline : Cubic spline data interpolator.

The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index. For more information on their behavior, see the SciPy documentation and SciPy tutorial.

Filling in NaN in a Series via linear interpolation.

>>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64
>>> s.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Filling in NaN in a Series by padding, but filling at most two consecutive NaN at a time.

>>> s = pd.Series([np.nan, "single_one", np.nan,
...                "fill_two_more", np.nan, np.nan, np.nan,
...                4.71, np.nan])
>>> s
0              NaN
1       single_one
2              NaN
3    fill_two_more
4              NaN
5              NaN
6              NaN
7             4.71
8              NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0              NaN
1       single_one
2       single_one
3    fill_two_more
4    fill_two_more
5    fill_two_more
6              NaN
7             4.71
8             4.71
dtype: object

Filling in NaN in a Series via polynomial interpolation or splines: Both ‘polynomial’ and ‘spline’ methods require that you also specify an order (int).

>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Fill the DataFrame forward (that is, going down) along each column using linear interpolation.

Note how the last entry in column ‘a’ is interpolated differently, because there is no entry after it to use for interpolation. Note how the first entry in column ‘b’ remains NaN, because there is no entry before it to use for interpolation.

>>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
...                    (np.nan, 2.0, np.nan, np.nan),
...                    (2.0, 3.0, np.nan, 9.0),
...                    (np.nan, 4.0, -4.0, 16.0)],
...                   columns=list('abcd'))
>>> df
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  NaN  2.0  NaN   NaN
2  2.0  3.0  NaN   9.0
3  NaN  4.0 -4.0  16.0
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  1.0  2.0 -2.0   5.0
2  2.0  3.0 -3.0   9.0
3  2.0  4.0 -4.0  16.0

Using polynomial interpolation.

>>> df['d'].interpolate(method='polynomial', order=2)
0     1.0
1     4.0
2     9.0
3    16.0
Name: d, dtype: float64
intersect(second_geometry, dimension)

Constructs a geometry that is the geometric intersection of the two input geometries. Different dimension values can be used to create different shape types. The intersection of two geometries of the same shape type is a geometry containing only the regions of overlap between the original geometries.

Paramters:
second_geometry
  • a second geometry

dimension
  • The topological dimension (shape type) of the

resulting geometry.

1 -A zero-dimensional geometry (point or multipoint). 2 -A one-dimensional geometry (polyline). 4 -A two-dimensional geometry (polygon).

property is_empty

Return True for each empty geometry, False for non-empty

property is_multipart

True, if the number of parts for the geometry is more than 1

isin(values)pandas.core.frame.DataFrame

Whether each element in the DataFrame is contained in values.

valuesiterable, Series, DataFrame or dict

The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

DataFrame

DataFrame of booleans showing whether each element in the DataFrame is contained in values.

DataFrame.eq: Equality test for DataFrame. Series.isin: Equivalent method on Series. Series.str.contains: Test if pattern or regex is contained within a

string of a Series or Index.

>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                   index=['falcon', 'dog'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0

When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

>>> df.isin([0, 2])
        num_legs  num_wings
falcon      True       True
dog        False       True

When values is a dict, we can pass values to check for each column separately:

>>> df.isin({'num_wings': [0, 3]})
        num_legs  num_wings
falcon     False      False
dog        False       True

When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in df2.

>>> other = pd.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]},
...                      index=['spider', 'falcon'])
>>> df.isin(other)
        num_legs  num_wings
falcon      True       True
dog        False      False
isna()pandas.core.frame.DataFrame

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

DataFrame

Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

DataFrame.isnull : Alias of isna. DataFrame.notna : Boolean inverse of isna. DataFrame.dropna : Omit axes labels with missing values. isna : Top-level isna.

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool
isnull()pandas.core.frame.DataFrame

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

DataFrame

Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

DataFrame.isnull : Alias of isna. DataFrame.notna : Boolean inverse of isna. DataFrame.dropna : Omit axes labels with missing values. isna : Top-level isna.

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool
items()Iterable[Tuple[Optional[Hashable], pandas.core.series.Series]]

Iterate over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

labelobject

The column names for the DataFrame being iterated over.

contentSeries

The column entries belonging to each label, as a Series.

DataFrame.iterrowsIterate over DataFrame rows as

(index, Series) pairs.

DataFrame.itertuplesIterate over DataFrame rows as namedtuples

of the values.

>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
        species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.items():
...     print(f'label: {label}')
...     print(f'content: {content}', sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
iteritems()Iterable[Tuple[Optional[Hashable], pandas.core.series.Series]]

Iterate over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

labelobject

The column names for the DataFrame being iterated over.

contentSeries

The column entries belonging to each label, as a Series.

DataFrame.iterrowsIterate over DataFrame rows as

(index, Series) pairs.

DataFrame.itertuplesIterate over DataFrame rows as namedtuples

of the values.

>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
        species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.items():
...     print(f'label: {label}')
...     print(f'content: {content}', sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
iterrows()Iterable[Tuple[Optional[Hashable], pandas.core.series.Series]]

Iterate over DataFrame rows as (index, Series) pairs.

indexlabel or tuple of label

The index of the row. A tuple for a MultiIndex.

dataSeries

The data of the row as a Series.

DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values. DataFrame.items : Iterate over (column name, Series) pairs.

  1. Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
    >>> row = next(df.iterrows())[1]
    >>> row
    int      1.0
    float    1.5
    Name: 0, dtype: float64
    >>> print(row['int'].dtype)
    float64
    >>> print(df['int'].dtype)
    int64
    

    To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally faster than iterrows.

  2. You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

itertuples(index: bool = True, name: Optional[str] = 'Pandas')

Iterate over DataFrame rows as namedtuples.

indexbool, default True

If True, return the index as the first element of the tuple.

namestr or None, default “Pandas”

The name of the returned namedtuples or None to return regular tuples.

iterator

An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields being the column values.

DataFrame.iterrowsIterate over DataFrame rows as (index, Series)

pairs.

DataFrame.items : Iterate over (column name, Series) pairs.

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. On python versions < 3.7 regular tuples are returned for DataFrames with a large number of columns (>254).

>>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index as the first element of the tuple:

>>> for row in df.itertuples(index=False):
...     print(row)
...
Pandas(num_legs=4, num_wings=0)
Pandas(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded namedtuples:

>>> for row in df.itertuples(name='Animal'):
...     print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)pandas.core.frame.DataFrame

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

otherDataFrame, Series, or list of DataFrame

Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.

onstr, list of str, or array-like, optional

Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.

how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’

How to handle the operation of the two objects.

  • left: use calling frame’s index (or column if on is specified)

  • right: use other’s index.

  • outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.

  • inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.

lsuffixstr, default ‘’

Suffix to use from left frame’s overlapping columns.

rsuffixstr, default ‘’

Suffix to use from right frame’s overlapping columns.

sortbool, default False

Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).

DataFrame

A dataframe containing columns from both the caller and other.

DataFrame.merge : For column(s)-on-column(s) operations.

Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.

Support for specifying index levels as the on parameter was added in version 0.23.0.

>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
...                    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df
  key   A
0  K0  A0
1  K1  A1
2  K2  A2
3  K3  A3
4  K4  A4
5  K5  A5
>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
...                       'B': ['B0', 'B1', 'B2']})
>>> other
  key   B
0  K0  B0
1  K1  B1
2  K2  B2

Join DataFrames using their indexes.

>>> df.join(other, lsuffix='_caller', rsuffix='_other')
  key_caller   A key_other    B
0         K0  A0        K0   B0
1         K1  A1        K1   B1
2         K2  A2        K2   B2
3         K3  A3       NaN  NaN
4         K4  A4       NaN  NaN
5         K5  A5       NaN  NaN

If we want to join using the key columns, we need to set key to be the index in both df and other. The joined DataFrame will have key as its index.

>>> df.set_index('key').join(other.set_index('key'))
      A    B
key
K0   A0   B0
K1   A1   B1
K2   A2   B2
K3   A3  NaN
K4   A4  NaN
K5   A5  NaN

Another option to join using the key columns is to use the on parameter. DataFrame.join always uses other’s index but we can use any column in df. This method preserves the original DataFrame’s index in the result.

>>> df.join(other.set_index('key'), on='key')
  key   A    B
0  K0  A0   B0
1  K1  A1   B1
2  K2  A2   B2
3  K3  A3  NaN
4  K4  A4  NaN
5  K5  A5  NaN
keys()

Get the ‘info axis’ (see Indexing for more).

This is index for Series, columns for DataFrame.

Index

Info axis.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

property label_point

The point at which the label is located. The labelPoint is always located within or on a feature.

last(offset)FrameOrSeries

Select final periods of time series data based on a date offset.

When having a DataFrame with dates as index, this function can select the last few rows based on a date offset.

offsetstr, DateOffset, dateutil.relativedelta

The offset length of the data that will be selected. For instance, ‘3D’ will display all the rows having their index within the last 3 days.

Series or DataFrame

A subset of the caller.

TypeError

If the index is not a DatetimeIndex

first : Select initial periods of time series based on a date offset. at_time : Select values at a particular time of the day. between_time : Select values between particular times of the day.

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the last 3 days:

>>> ts.last('3D')
            A
2018-04-13  3
2018-04-15  4

Notice the data for 3 last calendar days were returned, not the last 3 observed days in the dataset, and therefore data for 2018-04-11 was not returned.

property last_point

The last coordinate of the feature.

last_valid_index()

Return index for last non-NA/null value.

scalar : type of index

If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.

le(other, axis='columns', level=None)

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

DataFrame of bool

Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.

DataFrame.ltCompare DataFrames for strictly less than

inequality elementwise.

DataFrame.geCompare DataFrames for greater than inequality

or equality elementwise.

DataFrame.gtCompare DataFrames for strictly greater than

inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
property length

The length of the linear feature. Zero for point and multipoint feature types.

property length3D

The 3D length of the linear feature. Zero for point and multipoint feature types.

property loc

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

  • A list or array of labels, e.g. ['a', 'b', 'c'].

  • A slice object with labels, e.g. 'a':'f'.

    Warning

    Note that contrary to usual python slices, both the start and the stop are included

  • A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

  • An alignable boolean Series. The index of the key will be aligned before masking.

  • An alignable Index. The Index of the returned selection will be the input.

  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

See more at Selection by Label.

KeyError

If any items are not found.

IndexingError

If an indexed key is passed and its index is unalignable to the frame index.

DataFrame.at : Access a single value for a row/column label pair. DataFrame.iloc : Access group of rows and columns by integer position(s). DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the

Series/DataFrame.

Series.loc : Access group of values using labels.

Getting values

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

Single label. Note this returns the row as a Series.

>>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

List of labels. Note using [[]] returns a DataFrame.

>>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

Single label for row and column

>>> df.loc['cobra', 'shield']
2

Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8

Alignable boolean Series:

>>> df.loc[pd.Series([False, True, False],
...        index=['viper', 'sidewinder', 'cobra'])]
            max_speed  shield
sidewinder          7       8

Index (same behavior as df.reindex)

>>> df.loc[pd.Index(["cobra", "viper"], name="foo")]
       max_speed  shield
foo
cobra          1       2
viper          4       5

Conditional that returns a boolean Series

>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series with column labels specified

>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7

Callable that returns a boolean Series

>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8

Setting values

Set value for all items matching the list of labels

>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

Set value for an entire row

>>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

Set value for an entire column

>>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

Set value for rows matching callable condition

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0

Getting values on a DataFrame with an index that has integer labels

Another example using integers for the index

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8

Slice with integer labels for rows. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8

Getting values with a MultiIndex

A number of examples using a DataFrame with a MultiIndex

>>> tuples = [
...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...    ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...         [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Single label. Note this returns a DataFrame with a single index.

>>> df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4

Single index tuple. Note this returns a Series.

>>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this returns a Series.

>>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Single tuple. Note using [[]] returns a DataFrame.

>>> df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4

Single tuple for the index with a single label for the column

>>> df.loc[('cobra', 'mark i'), 'shield']
2

Slice from index tuple to single label

>>> df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Slice from index tuple to index tuple

>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1
lookup(row_labels, col_labels)numpy.ndarray

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

Deprecated since version 1.2.0: DataFrame.lookup is deprecated, use DataFrame.melt and DataFrame.loc instead. For further details see Looking up values by index/column labels.

row_labelssequence

The row labels to use for lookup.

col_labelssequence

The column labels to use for lookup.

numpy.ndarray

The found values.

lt(other, axis='columns', level=None)

Get Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

DataFrame of bool

Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.

DataFrame.ltCompare DataFrames for strictly less than

inequality elementwise.

DataFrame.geCompare DataFrames for greater than inequality

or equality elementwise.

DataFrame.gtCompare DataFrames for strictly greater than

inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
mad(axis=None, skipna=None, level=None)

Return the mean absolute deviation of the values over the requested axis.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default None

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

Series or DataFrame (if level specified)

mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

Replace values where the condition is True.

condbool Series/DataFrame, array-like, or callable

Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

otherscalar, Series/DataFrame, or callable

Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

inplacebool, default False

Whether to perform the operation in place on the data.

axisint, default None

Alignment axis if needed.

levelint, default None

Alignment level if needed.

errorsstr, {‘raise’, ‘ignore’}, default ‘raise’

Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

  • ‘raise’ : allow exceptions to be raised.

  • ‘ignore’ : suppress exceptions. On error return original object.

try_castbool, default False

Try to cast the result back to the input type (if possible).

Same type as caller or None if inplace=True.

DataFrame.where()Return an object of same shape as

self.

The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the mask documentation in indexing.

>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
>>> s.mask(s > 0)
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
>>> s.where(s > 1, 10)
0    10
1    10
2    2
3    3
4    4
dtype: int64
>>> s.mask(s > 1, 10)
0     0
1     1
2    10
3    10
4    10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
>>> m = df % 3 == 0
>>> df.where(m, -df)
   A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9
>>> df.where(m, -df) == np.where(m, df, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
>>> df.where(m, -df) == df.mask(~m, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This isthe equivalent of the numpy.ndarray method argmax.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

Series.sum : Return the sum. Series.min : Return the minimum. Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. DataFrame.sum : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. DataFrame.idxmax : Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.max()
8

Max using level names, as well as indices.

>>> s.max(level='blooded')
blooded
warm    4
cold    8
Name: legs, dtype: int64
>>> s.max(level=0)
blooded
warm    4
cold    8
Name: legs, dtype: int64
mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values over the requested axis.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

measure_on_line(second_geometry, as_percentage=False)

Returns a measure from the start point of this line to the in_point.

Paramters:
second_geometry
  • a second geometry

as_percentage
  • If False, the measure will be returned as a

distance; if True, the measure will be returned as a percentage.

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values over the requested axis.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)pandas.core.frame.DataFrame

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

id_varstuple, list, or ndarray, optional

Column(s) to use as identifier variables.

value_varstuple, list, or ndarray, optional

Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.

var_namescalar

Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.

value_namescalar, default ‘value’

Name to use for the ‘value’ column.

col_levelint or str, optional

If columns are a MultiIndex then use this level to melt.

ignore_indexbool, default True

If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.

New in version 1.1.0.

DataFrame

Unpivoted DataFrame.

melt : Identical method. pivot_table : Create a spreadsheet-style pivot table as a DataFrame. DataFrame.pivot : Return reshaped DataFrame organized

by given index / column values.

DataFrame.explodeExplode a DataFrame from list-like

columns to long format.

>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
...                    'B': {0: 1, 1: 3, 2: 5},
...                    'C': {0: 2, 1: 4, 2: 6}})
>>> df
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6
>>> df.melt(id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> df.melt(id_vars=['A'], value_vars=['B'],
...         var_name='myVarname', value_name='myValname')
   A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5

Original index values can be kept around:

>>> df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
0  a        C      2
1  b        C      4
2  c        C      6

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
   A  B  C
   D  E  F
0  a  1  2
1  b  3  4
2  c  5  6
>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
  (A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5
memory_usage(index=True, deep=False)pandas.core.series.Series

Return the memory usage of each column in bytes.

The memory usage can optionally include the contribution of the index and elements of object dtype.

This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

indexbool, default True

Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output.

deepbool, default False

If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.

Series

A Series whose index is the original column names and whose values is the memory usage of each column in bytes.

numpy.ndarray.nbytesTotal bytes consumed by the elements of an

ndarray.

Series.memory_usage : Bytes consumed by a Series. Categorical : Memory-efficient array for string values with

many repeated values.

DataFrame.info : Concise summary of a DataFrame.

>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t))
...              for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
   int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True
>>> df.memory_usage()
Index           128
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index            128
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df['object'].astype('category').memory_usage(deep=True)
5244
merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)pandas.core.frame.DataFrame

Merge DataFrame or named Series objects with a database-style join.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

rightDataFrame or named Series

Object to merge with.

how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’

Type of merge to be performed.

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.

  • right: use only keys from right frame, similar to a SQL right outer join; preserve key order.

  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.

  • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.

  • cross: creates the cartesian product from both frames, preserves the order of the left keys.

    New in version 1.2.0.

onlabel or list

Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

left_onlabel or list, or array-like

Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.

right_onlabel or list, or array-like

Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.

left_indexbool, default False

Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels.

right_indexbool, default False

Use the index from the right DataFrame as the join key. Same caveats as left_index.

sortbool, default False

Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).

suffixeslist-like, default is (“_x”, “_y”)

A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.

copybool, default True

If False, avoid copy if possible.

indicatorbool or str, default False

If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames.

validatestr, optional

If specified, checks if merge is of specified type.

  • “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.

  • “one_to_many” or “1:m”: check if merge keys are unique in left dataset.

  • “many_to_one” or “m:1”: check if merge keys are unique in right dataset.

  • “many_to_many” or “m:m”: allowed, but does not result in checks.

DataFrame

A DataFrame of the two merged objects.

merge_ordered : Merge with optional filling/interpolation. merge_asof : Merge on nearest keys. DataFrame.join : Similar method using indices.

Support for specifying index levels as the on, left_on, and right_on parameters was added in version 0.23.0 Support for merging named Series objects was added in version 0.24.0

>>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

>>> df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7

Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey',
...           suffixes=('_left', '_right'))
  lkey  value_left rkey  value_right
0  foo           1  foo            5
1  foo           1  foo            8
2  foo           5  foo            5
3  foo           5  foo            8
4  bar           2  bar            6
5  baz           3  baz            7

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
    Index(['value'], dtype='object')
>>> df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1
      a  b
0   foo  1
1   bar  2
>>> df2
      a  c
0   foo  3
1   baz  4
>>> df1.merge(df2, how='inner', on='a')
      a  b  c
0   foo  1  3
>>> df1.merge(df2, how='left', on='a')
      a  b  c
0   foo  1  3.0
1   bar  2  NaN
>>> df1 = pd.DataFrame({'left': ['foo', 'bar']})
>>> df2 = pd.DataFrame({'right': [7, 8]})
>>> df1
    left
0   foo
1   bar
>>> df2
    right
0   7
1   8
>>> df1.merge(df2, how='cross')
   left  right
0   foo      7
1   foo      8
2   bar      7
3   bar      8
merge_datasets(other)

This operation combines two dataframes into one new DataFrame. If the operation is combining two SpatialDataFrames, the geometry_type must match.

Argument

Description

other

Required SpatialDataFrame. Another SpatialDataFrame to combine.

Returns

SpatialDataFrame

min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This isthe equivalent of the numpy.ndarray method argmin.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

Series.sum : Return the sum. Series.min : Return the minimum. Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. DataFrame.sum : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. DataFrame.idxmax : Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.min()
0

Min using level names, as well as indices.

>>> s.min(level='blooded')
blooded
warm    2
cold    0
Name: legs, dtype: int64
>>> s.min(level=0)
blooded
warm    2
cold    0
Name: legs, dtype: int64
mod(other, axis='columns', level=None, fill_value=None)

Get Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
mode(axis=0, numeric_only=False, dropna=True)pandas.core.frame.DataFrame

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to iterate over while searching for the mode:

  • 0 or ‘index’ : get mode of each column

  • 1 or ‘columns’ : get mode of each row.

numeric_onlybool, default False

If True, only apply to numeric columns.

dropnabool, default True

Don’t consider counts of NaN/NaT.

New in version 0.24.0.

DataFrame

The modes of each column or row.

Series.mode : Return the highest frequency value in a Series. Series.value_counts : Return the counts of values in a Series.

>>> df = pd.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
           species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default, missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN.

>>> df.mode()
  species  legs  wings
0    bird   2.0    0.0
1     NaN   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0   2.0    0.0
1   NaN    2.0

To compute the mode over columns and not rows, use the axis parameter:

>>> df.mode(axis='columns', numeric_only=True)
           0    1
falcon   2.0  NaN
horse    4.0  NaN
spider   0.0  8.0
ostrich  2.0  NaN
mul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
multiply(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
property ndim

Return an int representing the number of axes / array dimensions.

Return 1 if Series. Otherwise return 2 if DataFrame.

ndarray.ndim : Number of array dimensions.

>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.ndim
1
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.ndim
2
ne(other, axis='columns', level=None)

Get Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

DataFrame of bool

Result of the comparison.

DataFrame.eq : Compare DataFrames for equality elementwise. DataFrame.ne : Compare DataFrames for inequality elementwise. DataFrame.le : Compare DataFrames for less than inequality

or equality elementwise.

DataFrame.ltCompare DataFrames for strictly less than

inequality elementwise.

DataFrame.geCompare DataFrames for greater than inequality

or equality elementwise.

DataFrame.gtCompare DataFrames for strictly greater than

inequality elementwise.

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
nlargest(n, columns, keep='first')pandas.core.frame.DataFrame

Return the first n rows ordered by columns in descending order.

Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.

nint

Number of rows to return.

columnslabel or list of labels

Column label(s) to order by.

keep{‘first’, ‘last’, ‘all’}, default ‘first’

Where there are duplicate values:

  • first : prioritize the first occurrence(s)

  • last : prioritize the last occurrence(s)

  • alldo not drop any duplicates, even it means

    selecting more than n items.

New in version 0.24.0.

DataFrame

The first n rows ordered by the given columns in descending order.

DataFrame.nsmallestReturn the first n rows ordered by columns in

ascending order.

DataFrame.sort_values : Sort DataFrame by the values. DataFrame.head : Return the first n rows without re-ordering.

This function cannot be used with all column types. For example, when specifying columns with object or category dtypes, TypeError is raised.

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 11300,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru          11300      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nlargest to select the three rows having the largest values in column “population”.

>>> df.nlargest(3, 'population')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Malta       434000    12011      MT

When using keep='last', ties are resolved in reverse order:

>>> df.nlargest(3, 'population', keep='last')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN

When using keep='all', all duplicate items are maintained:

>>> df.nlargest(3, 'population', keep='all')
          population      GDP alpha-2
France      65000000  2583560      FR
Italy       59000000  1937894      IT
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN

To order by the largest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nlargest(3, ['population', 'GDP'])
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN
notna()pandas.core.frame.DataFrame

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

DataFrame

Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

DataFrame.notnull : Alias of notna. DataFrame.isna : Boolean inverse of notna. DataFrame.dropna : Omit axes labels with missing values. notna : Top-level notna.

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool
notnull()pandas.core.frame.DataFrame

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

DataFrame

Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

DataFrame.notnull : Alias of notna. DataFrame.isna : Boolean inverse of notna. DataFrame.dropna : Omit axes labels with missing values. notna : Top-level notna.

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool
nsmallest(n, columns, keep='first')pandas.core.frame.DataFrame

Return the first n rows ordered by columns in ascending order.

Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant.

nint

Number of items to retrieve.

columnslist or str

Column name or names to order by.

keep{‘first’, ‘last’, ‘all’}, default ‘first’

Where there are duplicate values:

  • first : take the first occurrence.

  • last : take the last occurrence.

  • all : do not drop any duplicates, even it means selecting more than n items.

New in version 0.24.0.

DataFrame

DataFrame.nlargestReturn the first n rows ordered by columns in

descending order.

DataFrame.sort_values : Sort DataFrame by the values. DataFrame.head : Return the first n rows without re-ordering.

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 337000,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru         337000      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nsmallest to select the three rows having the smallest values in column “population”.

>>> df.nsmallest(3, 'population')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS

When using keep='last', ties are resolved in reverse order:

>>> df.nsmallest(3, 'population', keep='last')
          population  GDP alpha-2
Anguilla       11300  311      AI
Tuvalu         11300   38      TV
Nauru         337000  182      NR

When using keep='all', all duplicate items are maintained:

>>> df.nsmallest(3, 'population', keep='all')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR

To order by the smallest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nsmallest(3, ['population', 'GDP'])
          population  GDP alpha-2
Tuvalu         11300   38      TV
Anguilla       11300  311      AI
Nauru         337000  182      NR
nunique(axis=0, dropna=True)pandas.core.series.Series

Count distinct observations over requested axis.

Return Series with number of distinct observations. Can ignore NaN values.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

dropnabool, default True

Don’t include NaN in the counts.

Series

Series.nunique: Method nunique for Series. DataFrame.count: Count non-NA cells for each column or row.

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 1, 1]})
>>> df.nunique()
A    3
B    1
dtype: int64
>>> df.nunique(axis=1)
0    1
1    2
2    2
dtype: int64
overlaps(second_geometry)

Indicates if the intersection of the two geometries has the same shape type as one of the input geometries and is not equivalent to either of the input geometries.

Paramters:
second_geometry
  • a second geometry

pad(axis=None, inplace: bool = False, limit=None, downcast=None)Optional[FrameOrSeries]

Synonym for DataFrame.fillna() with method='ffill'.

Series/DataFrame or None

Object with missing values filled or None if inplace=True.

property part_count

The number of geometry parts for the feature.

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)FrameOrSeries

Percentage change between the current and a prior element.

Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.

periodsint, default 1

Periods to shift for forming percent change.

fill_methodstr, default ‘pad’

How to handle NAs before computing percent changes.

limitint, default None

The number of consecutive NAs to fill before stopping.

freqDateOffset, timedelta, or str, optional

Increment to use from time series API (e.g. ‘M’ or BDay()).

**kwargs

Additional keyword arguments are passed into DataFrame.shift or Series.shift.

chgSeries or DataFrame

The same type as the calling object.

Series.diff : Compute the difference of two elements in a Series. DataFrame.diff : Compute the difference of two elements in a DataFrame. Series.shift : Shift the index by some number of periods. DataFrame.shift : Shift the index by some number of periods.

Series

>>> s = pd.Series([90, 91, 85])
>>> s
0    90
1    91
2    85
dtype: int64
>>> s.pct_change()
0         NaN
1    0.011111
2   -0.065934
dtype: float64
>>> s.pct_change(periods=2)
0         NaN
1         NaN
2   -0.055556
dtype: float64

See the percentage change in a Series where filling NAs with last valid observation forward to next valid.

>>> s = pd.Series([90, 91, None, 85])
>>> s
0    90.0
1    91.0
2     NaN
3    85.0
dtype: float64
>>> s.pct_change(fill_method='ffill')
0         NaN
1    0.011111
2    0.000000
3   -0.065934
dtype: float64

DataFrame

Percentage change in French franc, Deutsche Mark, and Italian lira from 1980-01-01 to 1980-03-01.

>>> df = pd.DataFrame({
...     'FR': [4.0405, 4.0963, 4.3149],
...     'GR': [1.7246, 1.7482, 1.8519],
...     'IT': [804.74, 810.01, 860.13]},
...     index=['1980-01-01', '1980-02-01', '1980-03-01'])
>>> df
                FR      GR      IT
1980-01-01  4.0405  1.7246  804.74
1980-02-01  4.0963  1.7482  810.01
1980-03-01  4.3149  1.8519  860.13
>>> df.pct_change()
                  FR        GR        IT
1980-01-01       NaN       NaN       NaN
1980-02-01  0.013810  0.013684  0.006549
1980-03-01  0.053365  0.059318  0.061876

Percentage of change in GOOG and APPL stock volume. Shows computing the percentage change between columns.

>>> df = pd.DataFrame({
...     '2016': [1769950, 30586265],
...     '2015': [1500923, 40912316],
...     '2014': [1371819, 41403351]},
...     index=['GOOG', 'APPL'])
>>> df
          2016      2015      2014
GOOG   1769950   1500923   1371819
APPL  30586265  40912316  41403351
>>> df.pct_change(axis='columns')
      2016      2015      2014
GOOG   NaN -0.151997 -0.086016
APPL   NaN  0.337604  0.012002
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

funcfunction

Function to apply to the Series/DataFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

object : the return type of func.

DataFrame.apply : Apply a function along input axis of DataFrame. DataFrame.applymap : Apply a function elementwise on a whole DataFrame. Series.map : Apply a mapping correspondence on a

Series.

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)  

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )  

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )  
pivot(index=None, columns=None, values=None)pandas.core.frame.DataFrame

Return reshaped DataFrame organized by given index / column values.

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the User Guide for more on reshaping.

indexstr or object or a list of str, optional

Column to use to make new frame’s index. If None, uses existing index.

Changed in version 1.1.0: Also accept list of index names.

columnsstr or object or a list of str

Column to use to make new frame’s columns.

Changed in version 1.1.0: Also accept list of columns names.

valuesstr, object or a list of the previous, optional

Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

DataFrame

Returns reshaped DataFrame.

ValueError:

When there are any index, columns combinations with multiple values. DataFrame.pivot_table when you need to aggregate.

DataFrame.pivot_tableGeneralization of pivot that can handle

duplicate values for one index/column pair.

DataFrame.unstackPivot based on the index values instead of a

column.

wide_to_longWide panel to long format. Less flexible but more

user-friendly than melt.

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods.

>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index='foo', columns='bar')['baz']
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
      baz       zoo
bar   A  B  C   A  B  C
foo
one   1  2  3   x  y  z
two   4  5  6   q  w  t

You could also assign a list of column names or a list of index names.

>>> df = pd.DataFrame({
...        "lev1": [1, 1, 1, 2, 2, 2],
...        "lev2": [1, 1, 2, 1, 1, 2],
...        "lev3": [1, 2, 1, 2, 1, 2],
...        "lev4": [1, 2, 3, 4, 5, 6],
...        "values": [0, 1, 2, 3, 4, 5]})
>>> df
    lev1 lev2 lev3 lev4 values
0   1    1    1    1    0
1   1    1    2    2    1
2   1    2    1    3    2
3   2    1    2    4    3
4   2    1    1    5    4
5   2    2    2    6    5
>>> df.pivot(index="lev1", columns=["lev2", "lev3"],values="values")
lev2    1         2
lev3    1    2    1    2
lev1
1     0.0  1.0  2.0  NaN
2     4.0  3.0  NaN  5.0
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
      lev3    1    2
lev1  lev2
   1     1  0.0  1.0
         2  2.0  NaN
   2     1  4.0  3.0
         2  NaN  5.0

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
...                    "bar": ['A', 'A', 'B', 'C'],
...                    "baz": [1, 2, 3, 4]})
>>> df
   foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4

Notice that the first two rows are the same for our index and columns arguments.

>>> df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):
   ...
ValueError: Index contains duplicate entries, cannot reshape
pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False)pandas.core.frame.DataFrame

Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

values : column to aggregate, optional index : column, Grouper, array, or list of the previous

If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

columnscolumn, Grouper, array, or list of the previous

If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

aggfuncfunction, list of functions, dict, default numpy.mean

If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions.

fill_valuescalar, default None

Value to replace missing values with (in the resulting pivot table, after aggregation).

marginsbool, default False

Add all row / columns (e.g. for subtotal / grand totals).

dropnabool, default True

Do not include columns whose entries are all NaN.

margins_namestr, default ‘All’

Name of the row / column that will contain the totals when margins is True.

observedbool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Changed in version 0.25.0.

DataFrame

An Excel style pivot table.

DataFrame.pivotPivot without aggregation that can handle

non-numeric data.

DataFrame.melt: Unpivot a DataFrame from wide to long format,

optionally leaving identifiers set.

wide_to_longWide panel to long format. Less flexible but more

user-friendly than melt.

>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two",
...                          "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small",
...                          "small", "large", "small", "small",
...                          "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df
     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

This first example aggregates values by taking the sum.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum)
>>> table
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

We can also fill missing values using the fill_value parameter.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum, fill_value=0)
>>> table
C        large  small
A   B
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6

The next example aggregates by taking the mean across multiple columns.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': np.mean})
>>> table
                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333

We can also calculate multiple types of aggregations for any given value column.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': [min, max, np.mean]})
>>> table
                D    E
            mean  max      mean  min
A   C
bar large  5.500000  9.0  7.500000  6.0
    small  5.500000  9.0  8.500000  8.0
foo large  2.000000  5.0  4.500000  4.0
    small  2.333333  6.0  4.333333  2.0
plot(*args, **kwargs)

Plot draws the data on a web map. The user can describe in simple terms how to renderer spatial data using symbol. To make the process simpler a palette for which colors are drawn from can be used instead of explicit colors.

Explicit Argument

Description

df

required SpatialDataFrame or GeoSeries. This is the data to map.

map_widget

optional WebMap object. This is the map to display the data on.

palette

optional string/dict. Color mapping. For simple renderer, just provide a string. For more robust renderers like unique renderer, a dictionary can be given.

renderer_type

optional string. Determines the type of renderer to use for the provided dataset. The default is ‘s’ which is for simple renderers.

Allowed values:

  • ‘s’ - is a simple renderer that uses one symbol only.

  • ‘u’ - unique renderer symbolizes features based on one

    or more matching string attributes.

  • ‘c’ - A class breaks renderer symbolizes based on the

    value of some numeric attribute.

  • ‘h’ - heatmap renders point data into a raster

    visualization that emphasizes areas of higher density or weighted values.

symbol_style

optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.

symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Allowed symbol types based on geometries:

Point Symbols

  • ‘o’ - Circle (default)

  • ‘+’ - Cross

  • ‘D’ - Diamond

  • ‘s’ - Square

  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)

  • ‘’ - Backward Diagonal

  • ‘/’ - Forward Diagonal

  • ‘|’ - Vertical Bar

  • ‘-‘ - Horizontal Bar

  • ‘x’ - Diagonal Cross

  • ‘+’ - Cross

col

optional string/list. Field or fields used for heatmap, class breaks, or unique renderers.

palette

optional string. The color map to draw from in order to visualize the data. The default palette is ‘jet’. To get a visual representation of the allowed color maps, use the display_colormaps method.

alpha

optional float. This is a value between 0 and 1 with 1 being the default value. The alpha sets the transparancy of the renderer when applicable.

Render Syntax

The render syntax allows for users to fully customize symbolizing the data.

Simple Renderer

A simple renderer is a renderer that uses one symbol only.

Optional Argument

Description

symbol_style

optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.

symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Point Symbols

  • ‘o’ - Circle (default)

  • ‘+’ - Cross

  • ‘D’ - Diamond

  • ‘s’ - Square

  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)

  • ‘’ - Backward Diagonal

  • ‘/’ - Forward Diagonal

  • ‘|’ - Vertical Bar

  • ‘-‘ - Horizontal Bar

  • ‘x’ - Diagonal Cross

  • ‘+’ - Cross

description

Description of the renderer.

rotation_expression

A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets.

rotation_type

String value which controls the origin and direction of rotation on point features. If the rotationType is defined as arithmetic, the symbol is rotated from East in a counter-clockwise direction where East is the 0 degree axis. If the rotationType is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis.

Must be one of the following values:

  • arithmetic

  • geographic

visual_variables

An array of objects used to set rendering properties.

Heatmap Renderer

The HeatmapRenderer renders point data into a raster visualization that emphasizes areas of higher density or weighted values.

Optional Argument

Description

blur_radius

The radius (in pixels) of the circle over which the majority of each point’s value is spread.

field

This is optional as this renderer can be created if no field is specified. Each feature gets the same value/importance/weight or with a field where each feature is weighted by the field’s value.

max_intensity

The pixel intensity value which is assigned the final color in the color ramp.

min_intensity

The pixel intensity value which is assigned the initial color in the color ramp.

ratio

A number between 0-1. Describes what portion along the gradient the colorStop is added.

Unique Renderer

This renderer symbolizes features based on one or more matching string attributes.

Optional Argument

Description

background_fill_symbol

A symbol used for polygon features as a background if the renderer uses point symbols, e.g. for bivariate types & size rendering. Only applicable to polygon layers. PictureFillSymbols can also be used outside of the Map Viewer for Size and Predominance and Size renderers.

default_label

Default label for the default symbol used to draw unspecified values.

default_symbol

Symbol used when a value cannot be matched.

col

String or List of Strings. Attribute field(s) the renderer uses to match values.

field_delimiter

String inserted between the values if multiple attribute fields are specified.

rotation_expression

A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets. Rotation is set using a visual variable of type rotation info with a specified field or value expression property.

rotation_type

String property which controls the origin and direction of rotation. If the rotation type is defined as arithmetic the symbol is rotated from East in a counter-clockwise direction where East is the 0 degree axis. If the rotation type is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis. Must be one of the following values:

  • arithmetic

  • geographic

arcade_expression

An Arcade expression evaluating to either a string or a number.

arcade_title

The title identifying and describing the associated Arcade expression as defined in the valueExpression property.

visual_variables

An array of objects used to set rendering properties.

Class Breaks Renderer

A class breaks renderer symbolizes based on the value of some numeric attribute.

Optional Argument

Description

background_fill_symbol

A symbol used for polygon features as a background if the renderer uses point symbols, e.g. for bivariate types & size rendering. Only applicable to polygon layers. PictureFillSymbols can also be used outside of the Map Viewer for Size and Predominance and Size renderers.

default_label

Default label for the default symbol used to draw unspecified values.

default_symbol

Symbol used when a value cannot be matched.

method

Determines the classification method that was used to generate class breaks.

Must be one of the following values:

  • esriClassifyDefinedInterval

  • esriClassifyEqualInterval

  • esriClassifyGeometricalInterval

  • esriClassifyNaturalBreaks

  • esriClassifyQuantile

  • esriClassifyStandardDeviation

  • esriClassifyManual

field

Attribute field used for renderer.

min_value

The minimum numeric data value needed to begin class breaks.

normalization_field

Used when normalizationType is field. The string value indicating the attribute field by which the data value is normalized.

normalization_total

Used when normalizationType is percent-of-total, this number property contains the total of all data values.

normalization_type

Determine how the data was normalized.

Must be one of the following values:

  • esriNormalizeByField

  • esriNormalizeByLog

  • esriNormalizeByPercentOfTotal

rotation_expression

A constant value or an expression that derives the angle of rotation based on a feature attribute value. When an attribute name is specified, it’s enclosed in square brackets.

rotation_type

A string property which controls the origin and direction of rotation. If the rotation_type is defined as arithmetic, the symbol is rotated from East in a couter-clockwise direction where East is the 0 degree axis. If the rotationType is defined as geographic, the symbol is rotated from North in a clockwise direction where North is the 0 degree axis.

Must be one of the following values:

  • arithmetic

  • geographic

arcade_expression

An Arcade expression evaluating to a number.

arcade_title

The title identifying and describing the associated Arcade expression as defined in the arcade_expression property.

visual_variables

An object used to set rendering options.

Symbol Syntax

Optional Argument

Description

symbol_style

optional string. This is the type of symbol the user needs to create. Valid inputs are: simple, picture, text, or carto. The default is simple.

symbol_type

optional string. This is the symbology used by the geometry. For example ‘s’ for a Line geometry is a solid line. And ‘-‘ is a dash line.

Point Symbols

  • ‘o’ - Circle (default)

  • ‘+’ - Cross

  • ‘D’ - Diamond

  • ‘s’ - Square

  • ‘x’ - X

Polyline Symbols

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

Polygon Symbols

  • ‘s’ - Solid Fill (default)

  • ‘’ - Backward Diagonal

  • ‘/’ - Forward Diagonal

  • ‘|’ - Vertical Bar

  • ‘-‘ - Horizontal Bar

  • ‘x’ - Diagonal Cross

  • ‘+’ - Cross

cmap

optional string or list. This is the color scheme a user can provide if the exact color is not needed, or a user can provide a list with the color defined as: [red, green blue, alpha]. The values red, green, blue are from 0-255 and alpha is a float value from 0 - 1. The default value is ‘jet’ color scheme.

cstep

optional integer. If provided, its the color location on the color scheme.

Simple Symbols

This is a list of optional parameters that can be given for point, line or polygon geometries.

Argument

Description

marker_size

optional float. Numeric size of the symbol given in points.

marker_angle

optional float. Numeric value used to rotate the symbol. The symbol is rotated counter-clockwise. For example, The following, angle=-30, in will create a symbol rotated -30 degrees counter-clockwise; that is, 30 degrees clockwise.

marker_xoffset

Numeric value indicating the offset on the x-axis in points.

marker_yoffset

Numeric value indicating the offset on the y-axis in points.

line_width

optional float. Numeric value indicating the width of the line in points

outline_style

Optional string. For polygon point, and line geometries , a customized outline type can be provided.

Allowed Styles:

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

outline_color

optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.

Picture Symbol

This type of symbol only applies to Points, MultiPoints and Polygons.

Argument

Description

marker_angle

Numeric value that defines the number of degrees ranging from 0-360, that a marker symbol is rotated. The rotation is from East in a counter-clockwise direction where East is the 0 axis.

marker_xoffset

Numeric value indicating the offset on the x-axis in points.

marker_yoffset

Numeric value indicating the offset on the y-axis in points.

height

Numeric value used if needing to resize the symbol. Specify a value in points. If images are to be displayed in their original size, leave this blank.

width

Numeric value used if needing to resize the symbol. Specify a value in points. If images are to be displayed in their original size, leave this blank.

url

String value indicating the URL of the image. The URL should be relative if working with static layers. A full URL should be used for map service dynamic layers. A relative URL can be dereferenced by accessing the map layer image resource or the feature layer image resource.

image_data

String value indicating the base64 encoded data.

xscale

Numeric value indicating the scale factor in x direction.

yscale

Numeric value indicating the scale factor in y direction.

outline_color

optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.

outline_style

Optional string. For polygon point, and line geometries , a customized outline type can be provided.

Allowed Styles:

  • ‘s’ - Solid (default)

  • ‘-‘ - Dash

  • ‘-.’ - Dash Dot

  • ‘-..’ - Dash Dot Dot

  • ‘.’ - Dot

  • ‘–’ - Long Dash

  • ‘–.’ - Long Dash Dot

  • ‘n’ - Null

  • ‘s-‘ - Short Dash

  • ‘s-.’ - Short Dash Dot

  • ‘s-..’ - Short Dash Dot Dot

  • ‘s.’ - Short Dot

outline_color

optional string or list. This is the same color as the cmap property, but specifically applies to the outline_color.

line_width

optional float. Numeric value indicating the width of the line in points

Text Symbol

This type of symbol only applies to Points, MultiPoints and Polygons.

Argument

Description

font_decoration

The text decoration. Must be one of the following values: - line-through - underline - none

font_family

Optional string. The font family.

font_size

Optional float. The font size in points.

font_style

Optional string. The text style. - italic - normal - oblique

font_weight

Optional string. The text weight. Must be one of the following values: - bold - bolder - lighter - normal

background_color

optional string/list. Background color is represented as a four-element array or string of a color map.

halo_color

Optional string/list. Color of the halo around the text. The default is None.

halo_size

Optional integer/float. The point size of a halo around the text symbol.

horizontal_alignment

optional string. One of the following string values representing the horizontal alignment of the text. Must be one of the following values: - left - right - center - justify

kerning

optional boolean. Boolean value indicating whether to adjust the spacing between characters in the text string.

line_color

optional string/list. Outline color is represented as a four-element array or string of a color map.

line_width

optional integer/float. Outline size.

marker_angle

optional int. A numeric value that defines the number of degrees (0 to 360) that a text symbol is rotated. The rotation is from East in a counter-clockwise direction where East is the 0 axis.

marker_xoffset

optional int/float.Numeric value indicating the offset on the x-axis in points.

marker_yoffset

optional int/float.Numeric value indicating the offset on the x-axis in points.

right_to_left

optional boolean. Set to true if using Hebrew or Arabic fonts.

rotated

optional boolean. Boolean value indicating whether every character in the text string is rotated.

text

Required string. Text Value to display next to geometry.

vertical_alignment

Optional string. One of the following string values representing the vertical alignment of the text. Must be one of the following values: - top - bottom - middle - baseline

Cartographic Symbol

This type of symbol only applies to line geometries.

Argument

Description

line_width

optional float. Numeric value indicating the width of the line in points

cap

Optional string. The cap style.

join

Optional string. The join style.

miter_limit

Optional string. Size threshold for showing mitered line joins.

The kwargs parameter accepts all parameters of the create_symbol method and the create_renderer method.

property point_count

The total number of points for the feature.

point_from_angle_and_distance(angle, distance, method='GEODESCIC')

Returns a point at a given angle and distance in degrees and meters using the specified measurement type.

Parameters:
angle
  • The angle in degrees to the returned point.

distance
  • The distance in meters to the returned point.

method
  • PLANAR measurements reflect the projection of geographic

data onto the 2D surface (in other words, they will not take into account the curvature of the earth). GEODESIC, GREAT_ELLIPTIC, LOXODROME, and PRESERVE_SHAPE measurement types may be chosen as an alternative, if desired.

pop(item: Optional[Hashable])pandas.core.series.Series

Return item and drop from frame. Raise KeyError if not found.

itemlabel

Label of column to be popped.

Series

>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN
>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object
>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN
position_along_line(value, use_percentage=False)

Returns a point on a line at a specified distance from the beginning of the line.

Parameters:
value
  • The distance along the line.

use_percentage
  • The distance may be specified as a fixed unit

of measure or a ratio of the length of the line. If True, value is used as a percentage; if False, value is used as a distance. For percentages, the value should be expressed as a double from 0.0 (0%) to 1.0 (100%).

pow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return the product of the values over the requested axis.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

Series.sum : Return the sum. Series.min : Return the minimum. Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. DataFrame.sum : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. DataFrame.idxmax : Return the index of the maximum over the requested axis.

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([]).prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([]).prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0
>>> pd.Series([np.nan]).prod(min_count=1)
nan
product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return the product of the values over the requested axis.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

Series.sum : Return the sum. Series.min : Return the minimum. Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. DataFrame.sum : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. DataFrame.idxmax : Return the index of the maximum over the requested axis.

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([]).prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([]).prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0
>>> pd.Series([np.nan]).prod(min_count=1)
nan
project_as(spatial_reference, transformation_name=None)

Projects a geometry and optionally applies a geotransformation.

Parameter:
spatial_reference
  • The new spatial reference. This can be a

SpatialReference object or the coordinate system name.

transformation_name
  • The geotransformation name.

quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')

Return values at the given quantile over requested axis.

qfloat or array-like, default 0.5 (50% quantile)

Value between 0 <= q <= 1, the quantile(s) to compute.

axis{0, 1, ‘index’, ‘columns’}, default 0

Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

numeric_onlybool, default True

If False, the quantile of datetime and timedelta data will be computed as well.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

  • linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

  • lower: i.

  • higher: j.

  • nearest: i or j whichever is nearest.

  • midpoint: (i + j) / 2.

Series or DataFrame

If q is an array, a DataFrame will be returned where the

index is q, the columns are the columns of self, and the values are the quantiles.

If q is a float, a Series will be returned where the

index is the columns of self and the values are the quantiles.

core.window.Rolling.quantile: Rolling quantile. numpy.percentile: Numpy function to compute the percentile.

>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

Specifying numeric_only=False will also compute the quantile of datetime and timedelta data.

>>> df = pd.DataFrame({'A': [1, 2],
...                    'B': [pd.Timestamp('2010'),
...                          pd.Timestamp('2011')],
...                    'C': [pd.Timedelta('1 days'),
...                          pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A                    1.5
B    2010-07-02 12:00:00
C        1 days 12:00:00
Name: 0.5, dtype: object
query(expr, inplace=False, **kwargs)

Query the columns of a DataFrame with a boolean expression.

exprstr

The query string to evaluate.

You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.

You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2) would be referenced as Area (cm^2)). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used.

For example, if one of your columns is called a a and you want to sum it with b, your query should be `a a` + b.

New in version 0.25.0: Backtick quoting introduced.

New in version 1.0.0: Expanding functionality of backtick quoting for more than only spaces.

inplacebool

Whether the query should modify the data in place or return a modified copy.

**kwargs

See the documentation for eval() for complete details on the keyword arguments accepted by DataFrame.query().

DataFrame or None

DataFrame resulting from the provided query expression or None if inplace=True.

evalEvaluate a string describing operations on

DataFrame columns.

DataFrame.evalEvaluate a string describing operations on

DataFrame columns.

The result of the evaluation of this expression is first passed to DataFrame.loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.

For further details and examples see the query documentation in indexing.

Backtick quoted variables

Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.

During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign. For other characters that fall outside the ASCII range (U+0001..U+007F) and those that are not further specified in PEP 3131, the query parser will raise an error. This excludes whitespace different than the space character, but also the hashtag (as it is used for comments) and the backtick itself (backtick can also not be escaped).

In a special case, quotes that make a pair around a backtick can confuse the parser. For example, `it's` > `that's` will raise an error, as it forms a quoted string ('s > `that') with a backtick inside.

See also the Python documentation about lexical analysis (https://docs.python.org/3/reference/lexical_analysis.html) in combination with the source code in pandas.core.computation.parsing.

>>> df = pd.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
   A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6
>>> df.query('A > B')
   A  B  C C
4  5  2    6

The previous expression is equivalent to

>>> df[df.A > df.B]
   A  B  C C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

>>> df.query('B == `C C`')
   A   B  C C
0  1  10   10

The previous expression is equivalent to

>>> df[df.B == df['C C']]
   A   B  C C
0  1  10   10
query_point_and_distance(second_geometry, use_percentage=False)

Finds the point on the polyline nearest to the in_point and the distance between those points. Also returns information about the side of the line the in_point is on as well as the distance along the line where the nearest point occurs.

Paramters:
second_geometry
  • a second geometry

as_percentage
  • if False, the measure will be returned as

distance, True, measure will be a percentage

radd(other, axis='columns', level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator radd).

Equivalent to other + dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rank(axis=0, method: str = 'average', numeric_only: Optional[bool] = None, na_option: str = 'keep', ascending: bool = True, pct: bool = False)FrameOrSeries

Compute numerical data ranks (1 through n) along axis.

By default, equal values are assigned a rank that is the average of the ranks of those values.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties):

  • average: average rank of the group

  • min: lowest rank in the group

  • max: highest rank in the group

  • first: ranks assigned in order they appear in the array

  • dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values:

  • keep: assign NaN rank to NaN values

  • top: assign smallest rank to NaN values if ascending

  • bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

same type as caller

Return a Series or DataFrame with data ranks as values.

core.groupby.GroupBy.rank : Rank of values within each group.

>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
...                                    'spider', 'snake'],
...                         'Number_legs': [4, 2, 4, 8, np.nan]})
>>> df
    Animal  Number_legs
0      cat          4.0
1  penguin          2.0
2      dog          4.0
3   spider          8.0
4    snake          NaN

The following example shows how the method behaves with the above parameters:

  • default_rank: this is the default behaviour obtained without using any parameter.

  • max_rank: setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)

  • NA_bottom: choosing na_option = 'bottom', if there are records with NaN values they are placed at the bottom of the ranking.

  • pct_rank: when setting pct = True, the ranking is expressed as percentile rank.

>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df
    Animal  Number_legs  default_rank  max_rank  NA_bottom  pct_rank
0      cat          4.0           2.5       3.0        2.5     0.625
1  penguin          2.0           1.0       1.0        1.0     0.250
2      dog          4.0           2.5       3.0        2.5     0.625
3   spider          8.0           4.0       4.0        4.0     1.000
4    snake          NaN           NaN       NaN        5.0       NaN
rdiv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)pandas.core.frame.DataFrame

Conform Series/DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

keywords for axesarray-like, optional

New labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data.

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}

Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

  • None (default): don’t fill gaps

  • pad / ffill: Propagate last valid observation forward to next valid.

  • backfill / bfill: Use next valid observation to fill gap.

  • nearest: Use nearest valid observations to fill gap.

copybool, default True

Return a new object, even if the passed indexes are the same.

levelint or name

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuescalar, default np.NaN

Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

limitint, default None

Maximum number of consecutive elements to forward or backward fill.

toleranceoptional

Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Series/DataFrame with changed index.

DataFrame.set_index : Set row labels. DataFrame.reset_index : Remove row labels or move them to new columns. DataFrame.reindex_like : Change to same indices as other DataFrame.

DataFrame.reindex supports two calling conventions

  • (index=index_labels, columns=column_labels, ...)

  • (labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                   index=index)
>>> df
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
               http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> df.reindex(new_index, fill_value=0)
               http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02
>>> df.reindex(new_index, fill_value='missing')
              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

Or we can use “axis-style” keyword arguments

>>> df.reindex(['http_status', 'user_agent'], axis="columns")
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...                    index=date_index)
>>> df2
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

Suppose we decide to expand the dataframe to cover a wider date range.

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

The index entries that did not have a value in the original data frame (for example, ‘2009-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> df2.reindex(date_index2, method='bfill')
            prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

Please note that the NaN value present in the original dataframe (at index value 2010-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

See the user guide for more.

reindex_like(other, method: Optional[str] = None, copy: bool = True, limit=None, tolerance=None)FrameOrSeries

Return an object with matching indices as other object.

Conform the object to the same index on all axes. Optional filling logic, placing NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

otherObject of the same data type

Its row and column indices are used to define the new indices of this object.

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}

Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

  • None (default): don’t fill gaps

  • pad / ffill: propagate last valid observation forward to next valid

  • backfill / bfill: use next valid observation to fill gap

  • nearest: use nearest valid observations to fill gap.

copybool, default True

Return a new object, even if the passed indexes are the same.

limitint, default None

Maximum number of consecutive labels to fill for inexact matches.

toleranceoptional

Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Series or DataFrame

Same type as caller, but with changed indices on each axis.

DataFrame.set_index : Set row labels. DataFrame.reset_index : Remove row labels or move them to new columns. DataFrame.reindex : Change to new indices or expand indices.

Same as calling .reindex(index=other.index, columns=other.columns,...).

>>> df1 = pd.DataFrame([[24.3, 75.7, 'high'],
...                     [31, 87.8, 'high'],
...                     [22, 71.6, 'medium'],
...                     [35, 95, 'medium']],
...                    columns=['temp_celsius', 'temp_fahrenheit',
...                             'windspeed'],
...                    index=pd.date_range(start='2014-02-12',
...                                        end='2014-02-15', freq='D'))
>>> df1
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          24.3             75.7      high
2014-02-13          31.0             87.8      high
2014-02-14          22.0             71.6    medium
2014-02-15          35.0             95.0    medium
>>> df2 = pd.DataFrame([[28, 'low'],
...                     [30, 'low'],
...                     [35.1, 'medium']],
...                    columns=['temp_celsius', 'windspeed'],
...                    index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
...                                            '2014-02-15']))
>>> df2
            temp_celsius windspeed
2014-02-12          28.0       low
2014-02-13          30.0       low
2014-02-15          35.1    medium
>>> df2.reindex_like(df1)
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          28.0              NaN       low
2014-02-13          30.0              NaN       low
2014-02-14           NaN              NaN       NaN
2014-02-15          35.1              NaN    medium
rename(mapper: Optional[Union[Mapping[Optional[Hashable], Any], Callable[[Optional[Hashable]], Optional[Hashable]]]] = None, index: Optional[Union[Mapping[Optional[Hashable], Any], Callable[[Optional[Hashable]], Optional[Hashable]]]] = None, columns: Optional[Union[Mapping[Optional[Hashable], Any], Callable[[Optional[Hashable]], Optional[Hashable]]]] = None, axis: Optional[Union[str, int]] = None, copy: bool = True, inplace: bool = False, level: Union[Hashable, None, int] = None, errors: str = 'ignore')Optional[pandas.core.frame.DataFrame]

Alter axes labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

See the user guide for more.

mapperdict-like or function

Dict-like or function transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns.

indexdict-like or function

Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).

columnsdict-like or function

Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.

copybool, default True

Also copy underlying data.

inplacebool, default False

Whether to return a new DataFrame. If True then value of copy is ignored.

levelint or level name, default None

In case of a MultiIndex, only rename labels in the specified level.

errors{‘ignore’, ‘raise’}, default ‘ignore’

If ‘raise’, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the Index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

DataFrame or None

DataFrame with the renamed axis labels or None if inplace=True.

KeyError

If any of the labels is not found in the selected axis and “errors=’raise’”.

DataFrame.rename_axis : Set the name of the axis.

DataFrame.rename supports two calling conventions

  • (index=index_mapper, columns=columns_mapper, ...)

  • (mapper, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Rename columns using a mapping:

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})
   a  c
0  1  4
1  2  5
2  3  6

Rename index using a mapping:

>>> df.rename(index={0: "x", 1: "y", 2: "z"})
   A  B
x  1  4
y  2  5
z  3  6

Cast index labels to a different type:

>>> df.index
RangeIndex(start=0, stop=3, step=1)
>>> df.rename(index=str).index
Index(['0', '1', '2'], dtype='object')
>>> df.rename(columns={"A": "a", "B": "b", "C": "c"}, errors="raise")
Traceback (most recent call last):
KeyError: ['C'] not found in axis

Using axis-style parameters:

>>> df.rename(str.lower, axis='columns')
   a  b
0  1  4
1  2  5
2  3  6
>>> df.rename({1: 2, 2: 4}, axis='index')
   A  B
0  1  4
2  2  5
4  3  6
rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)

Set the name of the axis for the index or columns.

mapperscalar, list-like, optional

Value to set the axis name attribute.

index, columnsscalar, list-like, dict-like or function, optional

A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.

Changed in version 0.24.0.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to rename.

copybool, default True

Also copy underlying data.

inplacebool, default False

Modifies the object directly, instead of creating a new Series or DataFrame.

Series, DataFrame, or None

The same type as the caller or None if inplace=True.

Series.rename : Alter Series index labels or name. DataFrame.rename : Alter DataFrame index labels or name. Index.rename : Set new names on index.

DataFrame.rename_axis supports two calling conventions

  • (index=index_mapper, columns=columns_mapper, ...)

  • (mapper, axis={'index', 'columns'}, ...)

The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter copy is ignored.

The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.

We highly recommend using keyword arguments to clarify your intent.

Series

>>> s = pd.Series(["dog", "cat", "monkey"])
>>> s
0       dog
1       cat
2    monkey
dtype: object
>>> s.rename_axis("animal")
animal
0    dog
1    cat
2    monkey
dtype: object

DataFrame

>>> df = pd.DataFrame({"num_legs": [4, 4, 2],
...                    "num_arms": [0, 0, 2]},
...                   ["dog", "cat", "monkey"])
>>> df
        num_legs  num_arms
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("animal")
>>> df
        num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df
limbs   num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2

MultiIndex

>>> df.index = pd.MultiIndex.from_product([['mammal'],
...                                        ['dog', 'cat', 'monkey']],
...                                       names=['type', 'name'])
>>> df
limbs          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2
>>> df.rename_axis(index={'type': 'class'})
limbs          num_legs  num_arms
class  name
mammal dog            4         0
       cat            4         0
       monkey         2         2
>>> df.rename_axis(columns=str.upper)
LIMBS          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2
reorder_levels(order, axis=0)pandas.core.frame.DataFrame

Rearrange index levels using input order. May not drop or duplicate levels.

orderlist of int or list of str

List representing new level order. Reference level by number (position) or by key (label).

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Where to reorder levels.

DataFrame

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

Replace values given in to_replace with value.

Values of the DataFrame are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.

to_replacestr, regex, list, dict, Series, int, float, or None

How to find the values that will be replaced.

  • numeric, str or regex:

    • numeric: numeric values equal to to_replace will be replaced with value

    • str: string exactly matching to_replace will be replaced with value

    • regex: regexs matching to_replace will be replaced with value

  • list of str, regex, or numeric:

    • First, if to_replace and value are both lists, they must be the same length.

    • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.

    • str, regex and numeric rules apply as above.

  • dict:

    • Dicts can be used to specify different replacement values for different existing values. For example, {'a': 'b', 'y': 'z'} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.

    • For a DataFrame a dict can specify that different values should be replaced in different columns. For example, {'a': 1, 'b': 'z'} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.

    • For a DataFrame nested dictionaries, e.g., {'a': {'b': np.nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. The value parameter should be None to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.

  • None:

    • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

valuescalar, dict, list, str, regex, default None

Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.

inplacebool, default False

If True, in place. Note: this will modify any other views on this object (e.g. a column from a DataFrame). Returns the caller if this is True.

limitint or None, default None

Maximum size gap to forward or backward fill.

regexbool or same types as to_replace, default False

Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.

method{‘pad’, ‘ffill’, ‘bfill’, None}

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

DataFrame or None

Object after replacement or None if inplace=True.

AssertionError
  • If regex is not a bool and to_replace is not None.

TypeError
  • If to_replace is not a scalar, array-like, dict, or None

  • If to_replace is a dict and value is not a list, dict, ndarray, or Series

  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.

  • When replacing multiple bool or datetime64 objects and the arguments to to_replace does not match the type of the value being replaced

ValueError
  • If a list or an ndarray is passed to to_replace and value but they are not the same length.

DataFrame.fillna : Fill NA values. DataFrame.where : Replace values based on boolean condition. Series.str.replace : Simple string replacement.

  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.

  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.

  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.

  • When dict is used as the to_replace value, it is like key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.

Scalar `to_replace` and `value`

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
   A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

List-like `to_replace`

>>> df.replace([0, 1, 2, 3], 4)
   A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
   A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e
>>> s.replace([1, 2], method='bfill')
0    0
1    3
2    3
3    3
4    4
dtype: int64

dict-like `to_replace`

>>> df.replace({0: 10, 1: 100})
     A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e
>>> df.replace({'A': 0, 'B': 5}, 100)
     A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e
>>> df.replace({'A': {0: 100, 4: 400}})
     A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e

Regular expression `to_replace`

>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
      A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
      A    B
0   new  abc
1   foo  bar
2  bait  xyz
>>> df.replace(regex=r'^ba.$', value='new')
      A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})
      A    B
0   new  abc
1   xyz  new
2  bait  xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
      A    B
0   new  abc
1   new  new
2  bait  xyz

Compare the behavior of s.replace({'a': None}) and s.replace('a', None) to understand the peculiarities of the to_replace parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):

>>> s.replace({'a': None})
0      10
1    None
2    None
3       b
4    None
dtype: object

When value=None and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None) is actually equivalent to s.replace(to_replace='a', value=None, method='pad'):

>>> s.replace('a', None)
0    10
1    10
2    10
3     b
4     b
dtype: object
reproject(spatial_reference, transformation=None, inplace=False)

Reprojects a given dataframe into a new coordinate system.

Argument

Description

spatial_reference

Required Integer/SpatialReference. The spatial reference the data should be reprojected into.

transformation

Optional string. The optional transformation string.

inplace

Optional boolean. Default False. Modify the SpatialDataFrame in place (do not create a new object)

Returns

SpatialDataFrame

resample(rule, axis=0, closed: Optional[str] = None, label: Optional[str] = None, convention: str = 'start', kind: Optional[str] = None, loffset=None, base: Optional[int] = None, on=None, level=None, origin: Union[str, TimestampConvertibleTypes] = 'start_day', offset: Optional[TimedeltaConvertibleTypes] = None)Resampler

Resample time-series data.

Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

ruleDateOffset, Timedelta or str

The offset string or object representing target conversion.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Which axis to use for up- or down-sampling. For Series this will default to 0, i.e. along the rows. Must be DatetimeIndex, TimedeltaIndex or PeriodIndex.

closed{‘right’, ‘left’}, default None

Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

label{‘right’, ‘left’}, default None

Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

convention{‘start’, ‘end’, ‘s’, ‘e’}, default ‘start’

For PeriodIndex only, controls whether to use the start or end of rule.

kind{‘timestamp’, ‘period’}, optional, default None

Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained.

loffsettimedelta, default None

Adjust the resampled time labels.

Deprecated since version 1.1.0: You should add the loffset to the df.index after the resample. See below.

baseint, default 0

For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.

Deprecated since version 1.1.0: The new arguments that you should use are ‘offset’ or ‘origin’.

onstr, optional

For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.

levelstr or int, optional

For a MultiIndex, level (name or number) to use for resampling. level must be datetime-like.

origin{‘epoch’, ‘start’, ‘start_day’}, Timestamp or str, default ‘start_day’

The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If a timestamp is not used, these values are also supported:

  • ‘epoch’: origin is 1970-01-01

  • ‘start’: origin is the first value of the timeseries

  • ‘start_day’: origin is the first day at midnight of the timeseries

New in version 1.1.0.

offsetTimedelta or str, default is None

An offset timedelta added to the origin.

New in version 1.1.0.

Resampler object

groupby : Group by mapping, function, label, or list of labels. Series.resample : Resample a Series. DataFrame.resample: Resample a DataFrame.

See the user guide for more.

To learn more about the offset strings, please see this link.

Start by creating a series with 9 one minute timestamps.

>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64

Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin.

>>> series.resample('3T').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left. Please note that the value in the bucket used as the label is not included in the bucket, which it labels. For example, in the original series the bucket 2000-01-01 00:03:00 contains the value 3, but the summed value in the resampled bucket with the label 2000-01-01 00:03:00 does not include 3 (if it did, the summed value would be 6, not 3). To include this value close the right side of the bin interval as illustrated in the example below this one.

>>> series.resample('3T', label='right').sum()
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int64

Downsample the series into 3 minute bins as above, but close the right side of the bin interval.

>>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
Freq: 3T, dtype: int64

Upsample the series into 30 second bins.

>>> series.resample('30S').asfreq()[0:5]   # Select first 5 rows
2000-01-01 00:00:00   0.0
2000-01-01 00:00:30   NaN
2000-01-01 00:01:00   1.0
2000-01-01 00:01:30   NaN
2000-01-01 00:02:00   2.0
Freq: 30S, dtype: float64

Upsample the series into 30 second bins and fill the NaN values using the pad method.

>>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Upsample the series into 30 second bins and fill the NaN values using the bfill method.

>>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    1
2000-01-01 00:01:00    1
2000-01-01 00:01:30    2
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Pass a custom function via apply

>>> def custom_resampler(array_like):
...     return np.sum(array_like) + 5
...
>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00     8
2000-01-01 00:03:00    17
2000-01-01 00:06:00    26
Freq: 3T, dtype: int64

For a Series with a PeriodIndex, the keyword convention can be used to control whether to use the start or end of rule.

Resample a year by quarter using ‘start’ convention. Values are assigned to the first quarter of the period.

>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
...                                             freq='A',
...                                             periods=2))
>>> s
2012    1
2013    2
Freq: A-DEC, dtype: int64
>>> s.resample('Q', convention='start').asfreq()
2012Q1    1.0
2012Q2    NaN
2012Q3    NaN
2012Q4    NaN
2013Q1    2.0
2013Q2    NaN
2013Q3    NaN
2013Q4    NaN
Freq: Q-DEC, dtype: float64

Resample quarters by month using ‘end’ convention. Values are assigned to the last month of the period.

>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',
...                                                   freq='Q',
...                                                   periods=4))
>>> q
2018Q1    1
2018Q2    2
2018Q3    3
2018Q4    4
Freq: Q-DEC, dtype: int64
>>> q.resample('M', convention='end').asfreq()
2018-03    1.0
2018-04    NaN
2018-05    NaN
2018-06    2.0
2018-07    NaN
2018-08    NaN
2018-09    3.0
2018-10    NaN
2018-11    NaN
2018-12    4.0
Freq: M, dtype: float64

For DataFrame objects, the keyword on can be used to specify the column instead of the index for resampling.

>>> d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...      'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018',
...                                     periods=8,
...                                     freq='W')
>>> df
   price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25
>>> df.resample('M', on='week_starting').mean()
               price  volume
week_starting
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0

For a DataFrame with MultiIndex, the keyword level can be used to specify on which level the resampling needs to take place.

>>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...       'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df2 = pd.DataFrame(d2,
...                    index=pd.MultiIndex.from_product([days,
...                                                     ['morning',
...                                                      'afternoon']]
...                                                     ))
>>> df2
                      price  volume
2000-01-01 morning       10      50
           afternoon     11      60
2000-01-02 morning        9      40
           afternoon     13     100
2000-01-03 morning       14      50
           afternoon     18     100
2000-01-04 morning       17      40
           afternoon     19      50
>>> df2.resample('D', level=0).sum()
            price  volume
2000-01-01     21     110
2000-01-02     22     140
2000-01-03     32     150
2000-01-04     36      90

If you want to adjust the start of the bins based on a fixed timestamp:

>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7T, dtype: int64
>>> ts.resample('17min').sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='epoch').sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='2000-01-01').sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17T, dtype: int64

If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:

>>> ts.resample('17min', origin='start').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64
>>> ts.resample('17min', offset='23h30min').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64

To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:

>>> ts.resample('17min', offset='2min').sum()
2000-10-01 23:16:00     0
2000-10-01 23:33:00     9
2000-10-01 23:50:00    36
2000-10-02 00:07:00    39
2000-10-02 00:24:00    24
Freq: 17T, dtype: int64

To replace the use of the deprecated loffset argument:

>>> from pandas.tseries.frequencies import to_offset
>>> loffset = '19min'
>>> ts_out = ts.resample('17min').sum()
>>> ts_out.index = ts_out.index + to_offset(loffset)
>>> ts_out
2000-10-01 23:33:00     0
2000-10-01 23:50:00     9
2000-10-02 00:07:00    21
2000-10-02 00:24:00    54
2000-10-02 00:41:00    24
Freq: 17T, dtype: int64
reset_index(level: Optional[Union[Hashable, Sequence[Hashable]]] = None, drop: bool = False, inplace: bool = False, col_level: Hashable = 0, col_fill: Optional[Hashable] = '')Optional[pandas.core.frame.DataFrame]

Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

levelint, str, tuple, or list, default None

Only remove the given levels from the index. Removes all levels by default.

dropbool, default False

Do not try to insert index into dataframe columns. This resets the index to the default integer index.

inplacebool, default False

Modify the DataFrame in place (do not create a new object).

col_levelint or str, default 0

If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.

col_fillobject, default ‘’

If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

DataFrame or None

DataFrame with the new index or None if inplace=True.

DataFrame.set_index : Opposite of reset_index. DataFrame.reindex : Change to new indices or expand indices. DataFrame.reindex_like : Change to same indices as other DataFrame.

>>> df = pd.DataFrame([('bird', 389.0),
...                    ('bird', 24.0),
...                    ('mammal', 80.5),
...                    ('mammal', np.nan)],
...                   index=['falcon', 'parrot', 'lion', 'monkey'],
...                   columns=('class', 'max_speed'))
>>> df
         class  max_speed
falcon    bird      389.0
parrot    bird       24.0
lion    mammal       80.5
monkey  mammal        NaN

When we reset the index, the old index is added as a column, and a new sequential index is used:

>>> df.reset_index()
    index   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

We can use the drop parameter to avoid the old index being added as a column:

>>> df.reset_index(drop=True)
    class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal        NaN

You can also use reset_index with MultiIndex.

>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
...                                    ('bird', 'parrot'),
...                                    ('mammal', 'lion'),
...                                    ('mammal', 'monkey')],
...                                   names=['class', 'name'])
>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
...                                      ('species', 'type')])
>>> df = pd.DataFrame([(389.0, 'fly'),
...                    ( 24.0, 'fly'),
...                    ( 80.5, 'run'),
...                    (np.nan, 'jump')],
...                   index=index,
...                   columns=columns)
>>> df
               speed species
                 max    type
class  name
bird   falcon  389.0     fly
       parrot   24.0     fly
mammal lion     80.5     run
       monkey    NaN    jump

If the index has multiple levels, we can reset a subset of them:

>>> df.reset_index(level='class')
         class  speed species
                  max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

If we are not dropping the index, by default, it is placed in the top level. We can place it in another level:

>>> df.reset_index(level='class', col_level=1)
                speed species
         class    max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

When the index is inserted under another level, we can specify under which one with the parameter col_fill:

>>> df.reset_index(level='class', col_level=1, col_fill='species')
              species  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

If we specify a nonexistent level for col_fill, it is created:

>>> df.reset_index(level='class', col_level=1, col_fill='genus')
                genus  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump
rfloordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rmod(other, axis='columns', level=None, fill_value=None)

Get Modulo of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rmul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Equivalent to other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rolling(window: Union[int, timedelta, BaseOffset, BaseIndexer], min_periods: Optional[int] = None, center: bool_t = False, win_type: Optional[str] = None, on: Optional[str] = None, axis: Axis = 0, closed: Optional[str] = None)

Provide rolling window calculations.

windowint, offset, or BaseIndexer subclass

Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes.

If a BaseIndexer subclass is passed, calculates the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, and closed will be passed to get_window_bounds.

min_periodsint, default None

Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window.

centerbool, default False

Set the labels at the center of the window.

win_typestr, default None

Provide a window type. If None, all points are evenly weighted. See the notes below for further information.

onstr, optional

For a DataFrame, a datetime-like column or MultiIndex level on which to calculate the rolling window, rather than the DataFrame’s index. Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

axis : int or str, default 0 closed : str, default None

Make the interval closed on the ‘right’, ‘left’, ‘both’ or ‘neither’ endpoints. Defaults to ‘right’.

Changed in version 1.2.0: The closed parameter with fixed windows is now supported.

a Window or Rolling sub-classed for the particular operation

expanding : Provides expanding transformations. ewm : Provides exponential weighted functions.

By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

To learn more about the offsets & frequency strings, please see this link.

If win_type=None, all points are evenly weighted; otherwise, win_type can accept a string of any scipy.signal window function.

Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature. Please see the third example below on how to add the additional parameters.

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Rolling sum with a window length of 2, using the ‘triang’ window type.

>>> df.rolling(2, win_type='triang').sum()
     B
0  NaN
1  0.5
2  1.5
3  NaN
4  NaN

Rolling sum with a window length of 2, using the ‘gaussian’ window type (note how we need to specify std).

>>> df.rolling(2, win_type='gaussian').sum(std=3)
          B
0       NaN
1  0.986207
2  2.958621
3       NaN
4       NaN

Rolling sum with a window length of 2, min_periods defaults to the window length.

>>> df.rolling(2).sum()
     B
0  NaN
1  1.0
2  3.0
3  NaN
4  NaN

Same as above, but explicitly set the min_periods

>>> df.rolling(2, min_periods=1).sum()
     B
0  0.0
1  1.0
2  3.0
3  2.0
4  4.0

Same as above, but with forward-looking windows

>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
>>> df.rolling(window=indexer, min_periods=1).sum()
     B
0  1.0
1  3.0
2  2.0
3  4.0
4  4.0

A ragged (meaning not-a-regular frequency), time-indexed DataFrame

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
...                   index = [pd.Timestamp('20130101 09:00:00'),
...                            pd.Timestamp('20130101 09:00:02'),
...                            pd.Timestamp('20130101 09:00:03'),
...                            pd.Timestamp('20130101 09:00:05'),
...                            pd.Timestamp('20130101 09:00:06')])
>>> df
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

Contrasting to an integer rolling window, this will roll a variable length window corresponding to the time period. The default for min_periods is 1.

>>> df.rolling('2s').sum()
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0
round(decimals=0, *args, **kwargs)pandas.core.frame.DataFrame

Round a DataFrame to a variable number of decimal places.

decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

*args

Additional keywords have no effect but might be accepted for compatibility with numpy.

**kwargs

Additional keywords have no effect but might be accepted for compatibility with numpy.

DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

numpy.around : Round a numpy array to the given number of decimals. Series.round : Round a Series to the given number of decimals.

>>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...                   columns=['dogs', 'cats'])
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = pd.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
rpow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rsub(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rtruediv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)FrameOrSeries

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once.

weightsstr or ndarray-like, optional

Default ‘None’ results in equal probability weighting. If passed a Series, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a DataFrame, will accept the name of a column when axis = 0. Unless weights are a Series, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.

random_stateint, array-like, BitGenerator, np.random.RandomState, optional

If int, array-like, or BitGenerator (NumPy>=1.17), seed for random number generator If np.random.RandomState, use as numpy RandomState object.

Changed in version 1.1.0: array-like and BitGenerator (for NumPy>=1.17) object now passed to np.random.RandomState() as seed

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames).

Series or DataFrame

A new object of same type as caller containing n items randomly sampled from the caller object.

DataFrameGroupBy.sample: Generates random samples from each group of a

DataFrame object.

SeriesGroupBy.sample: Generates random samples from each group of a

Series object.

numpy.random.choice: Generates a random sample from a given 1-D numpy

array.

If frac > 1, replacement should be set to True.

>>> df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
...                    'num_wings': [2, 0, 0, 0],
...                    'num_specimen_seen': [10, 2, 1, 8]},
...                   index=['falcon', 'dog', 'spider', 'fish'])
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the Series df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

>>> df['num_legs'].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the DataFrame with replacement:

>>> df.sample(frac=0.5, replace=True, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the DataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

>>> df.sample(frac=2, replace=True, random_state=1)
        num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a DataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8
segment_along_line(start_measure, end_measure, use_percentage=False)

Returns a Polyline between start and end measures. Similar to Polyline.positionAlongLine but will return a polyline segment between two points on the polyline instead of a single point.

Parameters:
start_measure
  • The starting distance from the beginning of the

line.

end_measure
  • The ending distance from the beginning of the

line.

use_percentage
  • The start and end measures may be specified as

fixed units or as a ratio. If True, start_measure and end_measure are used as a percentage; if False, start_measure and end_measure are used as a distance. For percentages, the measures should be expressed as a double from 0.0 (0 percent) to 1.0 (100 percent).

select_by_location(other, matches_only=True)

Selects all rows in a given SpatialDataFrame based on a given geometry

Argument

Description

other

Required Geometry. A geometry object to check for intersection.

matches_only

Optional boolean. if true, only matched records will be returned, else a field called ‘select_by_location’ will be added to the dataframe with the results of the select by location.

Returns

SpatialDataFrame

select_dtypes(include=None, exclude=None)pandas.core.frame.DataFrame

Return a subset of the DataFrame’s columns based on the column dtypes.

include, excludescalar or list-like

A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

DataFrame

The subset of the frame including the dtypes in include and excluding the dtypes in exclude.

ValueError
  • If both of include and exclude are empty

  • If include and exclude have overlapping elements

  • If any kind of string dtype is passed in.

DataFrame.dtypes: Return Series with the data type of each column.

  • To select all numeric types, use np.number or 'number'

  • To select strings you must use the object dtype, but note that this will return all object dtype columns

  • See the numpy dtype hierarchy

  • To select datetimes, use np.datetime64, 'datetime' or 'datetime64'

  • To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'

  • To select Pandas categorical dtypes, use 'category'

  • To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or 'datetime64[ns, tz]'

>>> df = pd.DataFrame({'a': [1, 2] * 3,
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0
>>> df.select_dtypes(include='bool')
   b
0  True
1  False
2  True
3  False
4  True
5  False
>>> df.select_dtypes(include=['float64'])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0
>>> df.select_dtypes(exclude=['int64'])
       b    c
0   True  1.0
1  False  2.0
2   True  1.0
3  False  2.0
4   True  1.0
5  False  2.0
sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

axis : {index (0), columns (1)} skipna : bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Series or DataFrame (if level specified)

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

property series_extent

Return a single bounding box (xmin, ymin, xmax, ymax) for all geometries

This is a shortcut for calculating the min/max x and y bounds individually.

set_axis(labels, axis: Union[str, int] = 0, inplace: bool = False)

Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

labelslist-like, Index

The values for the new index.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to update. The value 0 identifies the rows, and 1 identifies the columns.

inplacebool, default False

Whether to return a new DataFrame instance.

renamedDataFrame or None

An object of type DataFrame or None if inplace=True.

DataFrame.rename_axis : Alter the name of the index or columns.

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

Change the row labels.

>>> df.set_axis(['a', 'b', 'c'], axis='index')
   A  B
a  1  4
b  2  5
c  3  6

Change the column labels.

>>> df.set_axis(['I', 'II'], axis='columns')
   I  II
0  1   4
1  2   5
2  3   6

Now, update the labels inplace.

>>> df.set_axis(['i', 'ii'], axis='columns', inplace=True)
>>> df
   i  ii
0  1   4
1  2   5
2  3   6
set_flags(*, copy: bool = False, allows_duplicate_labels: Optional[bool] = None)FrameOrSeries

Return a new object with updated flags.

allows_duplicate_labelsbool, optional

Whether the returned object allows duplicate labels.

Series or DataFrame

The same type as the caller.

DataFrame.attrs : Global metadata applying to this dataset. DataFrame.flags : Global flags applying to this object.

This method returns a new object that’s a view on the same data as the input. Mutating the input or the output values will be reflected in the other.

This method is intended to be used in method chains.

“Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False
set_geometry(col, drop=False, inplace=False, sr=None)

Set the SpatialDataFrame geometry using either an existing column or the specified input. By default yields a new object.

The original geometry column is replaced with the input.

Argument

Description

col

Required string/np.array. column label or array

drop

Optional boolean. Default True. Delete column to be used as the new geometry

inplace

Optional boolean. Default False. Modify the SpatialDataFrame in place (do not create a new object)

sr

Optional SpatialReference/Integer. The wkid value Coordinate system to use. If passed, overrides both DataFrame and col’s sr. Otherwise, tries to get sr from passed col values or DataFrame.

Returns

SpatialDataFrame

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

keyslabel or array-like or list of labels/arrays

This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator.

dropbool, default True

Delete columns to be used as the new index.

appendbool, default False

Whether to append columns to existing index.

inplacebool, default False

If True, modifies the DataFrame in place (do not create a new object).

verify_integritybool, default False

Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.

DataFrame or None

Changed row labels or None if inplace=True.

DataFrame.reset_index : Opposite of set_index. DataFrame.reindex : Change to new indices or expand indices. DataFrame.reindex_like : Change to same indices as other DataFrame.

>>> df = pd.DataFrame({'month': [1, 4, 7, 10],
...                    'year': [2012, 2014, 2013, 2014],
...                    'sale': [55, 40, 84, 31]})
>>> df
   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Set the index to become the ‘month’ column:

>>> df.set_index('month')
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

Create a MultiIndex using columns ‘year’ and ‘month’:

>>> df.set_index(['year', 'month'])
            sale
year  month
2012  1     55
2014  4     40
2013  7     84
2014  10    31

Create a MultiIndex using an Index and a column:

>>> df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
         month  sale
   year
1  2012  1      55
2  2014  4      40
3  2013  7      84
4  2014  10     31

Create a MultiIndex using two Series:

>>> s = pd.Series([1, 2, 3, 4])
>>> df.set_index([s, s**2])
      month  year  sale
1 1       1  2012    55
2 4       4  2014    40
3 9       7  2013    84
4 16     10  2014    31
property shape

Return a tuple representing the dimensionality of the DataFrame.

ndarray.shape : Tuple of array dimensions.

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.shape
(2, 2)
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4],
...                    'col3': [5, 6]})
>>> df.shape
(2, 3)
shift(periods=1, freq=None, axis=0, fill_value=<object object>)pandas.core.frame.DataFrame

Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realigning the data. If freq is passed (in this case, the index must be date or datetime, or it will raise a NotImplementedError), the index will be increased using the periods and the freq. freq can be inferred when specified as “infer” as long as either freq or inferred_freq attribute is set in the index.

periodsint

Number of periods to shift. Can be positive or negative.

freqDateOffset, tseries.offsets, timedelta, or str, optional

Offset to use from the tseries module or time rule (e.g. ‘EOM’). If freq is specified then the index values are shifted but the data is not realigned. That is, use freq if you would like to extend the index when shifting and preserve the original data. If freq is specified as “infer” then it will be inferred from the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Shift direction.

fill_valueobject, optional

The scalar value to use for newly introduced missing values. the default depends on the dtype of self. For numeric data, np.nan is used. For datetime, timedelta, or period data, etc. NaT is used. For extension dtypes, self.dtype.na_value is used.

Changed in version 1.1.0.

DataFrame

Copy of input object, shifted.

Index.shift : Shift values of Index. DatetimeIndex.shift : Shift values of DatetimeIndex. PeriodIndex.shift : Shift values of PeriodIndex. tshift : Shift the time index, using the index’s frequency if

available.

>>> df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
...                    "Col2": [13, 23, 18, 33, 48],
...                    "Col3": [17, 27, 22, 37, 52]},
...                   index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
            Col1  Col2  Col3
2020-01-01    10    13    17
2020-01-02    20    23    27
2020-01-03    15    18    22
2020-01-04    30    33    37
2020-01-05    45    48    52
>>> df.shift(periods=3)
            Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02   NaN   NaN   NaN
2020-01-03   NaN   NaN   NaN
2020-01-04  10.0  13.0  17.0
2020-01-05  20.0  23.0  27.0
>>> df.shift(periods=1, axis="columns")
            Col1  Col2  Col3
2020-01-01   NaN    10    13
2020-01-02   NaN    20    23
2020-01-03   NaN    15    18
2020-01-04   NaN    30    33
2020-01-05   NaN    45    48
>>> df.shift(periods=3, fill_value=0)
            Col1  Col2  Col3
2020-01-01     0     0     0
2020-01-02     0     0     0
2020-01-03     0     0     0
2020-01-04    10    13    17
2020-01-05    20    23    27
>>> df.shift(periods=3, freq="D")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52
>>> df.shift(periods=3, freq="infer")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52
property sindex
property size

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

ndarray.size : Number of elements in the array.

>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4
skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased skew over requested axis.

Normalized by N-1.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

slice_shift(periods: int = 1, axis=0)FrameOrSeries

Equivalent to shift without copying data. The shifted data will not include the dropped periods and the shifted axis will be smaller than the original.

Deprecated since version 1.2.0: slice_shift is deprecated, use DataFrame/Series.shift instead.

periodsint

Number of periods to move, can be positive or negative.

shifted : same type as caller

While the slice_shift is faster than shift, you may pay for it later during alignment.

snap_to_line(second_geometry)

Returns a new point based on in_point snapped to this geometry.

Paramters:
second_geometry
  • a second geometry

sort_index(axis=0, level=None, ascending: Union[bool, int, Sequence[Union[bool, int]]] = True, inplace: bool = False, kind: str = 'quicksort', na_position: str = 'last', sort_remaining: bool = True, ignore_index: bool = False, key: Optional[Callable[[Index], Union[Index, AnyArrayLike]]] = None)

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.

levelint or level name or list of ints or list of level names

If not None, sort on values in specified index level(s).

ascendingbool or list-like of bools, default True

Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.

inplacebool, default False

If True, perform operation in-place.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.

na_position{‘first’, ‘last’}, default ‘last’

Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.

sort_remainingbool, default True

If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.

keycallable, optional

If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect an Index and return an Index of the same shape. For MultiIndex inputs, the key is applied per level.

New in version 1.1.0.

DataFrame or None

The original DataFrame sorted by the labels or None if inplace=True.

Series.sort_index : Sort Series by the index. DataFrame.sort_values : Sort DataFrame by the value. Series.sort_values : Sort Series by the value.

>>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150],
...                   columns=['A'])
>>> df.sort_index()
     A
1    4
29   2
100  1
150  5
234  3

By default, it sorts in ascending order, to sort in descending order, use ascending=False

>>> df.sort_index(ascending=False)
     A
234  3
150  5
100  1
29   2
1    4

A key function can be specified which is applied to the index before sorting. For a MultiIndex this is applied to each level separately.

>>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd'])
>>> df.sort_index(key=lambda x: x.str.lower())
   a
A  1
b  2
C  3
d  4
sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key: Optional[Callable[[Series], Union[Series, AnyArrayLike]]] = None)

Sort by the values along either axis.

bystr or list of str

Name or list of names to sort by.

  • if axis is 0 or ‘index’ then by may contain index levels and/or column labels.

  • if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis to be sorted.

ascendingbool or list of bool, default True

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

inplacebool, default False

If True, perform operation in-place.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.

na_position{‘first’, ‘last’}, default ‘last’

Puts NaNs at the beginning if first; last puts NaNs at the end.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.

keycallable, optional

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

New in version 1.1.0.

DataFrame or None

DataFrame with sorted values or None if inplace=True.

DataFrame.sort_index : Sort a DataFrame by the index. Series.sort_values : Similar method for a Series.

>>> df = pd.DataFrame({
...     'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
...     'col2': [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
...     'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Sort by col1

>>> df.sort_values(by=['col1'])
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort by multiple columns

>>> df.sort_values(by=['col1', 'col2'])
  col1  col2  col3 col4
1    A     1     1    B
0    A     2     0    a
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort Descending

>>> df.sort_values(by='col1', ascending=False)
  col1  col2  col3 col4
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B
3  NaN     8     4    D

Putting NAs first

>>> df.sort_values(by='col1', ascending=False, na_position='first')
  col1  col2  col3 col4
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B

Sorting with a key function

>>> df.sort_values(by='col4', key=lambda col: col.str.lower())
   col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Natural sort with the key argument, using the natsort <https://github.com/SethMMorton/natsort> package.

>>> df = pd.DataFrame({
...    "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
...    "value": [10, 20, 30, 40, 50]
... })
>>> df
    time  value
0    0hr     10
1  128hr     20
2   72hr     30
3   48hr     40
4   96hr     50
>>> from natsort import index_natsorted
>>> df.sort_values(
...    by="time",
...    key=lambda x: np.argsort(index_natsorted(df["time"]))
... )
    time  value
0    0hr     10
3   48hr     40
2   72hr     30
4   96hr     50
1  128hr     20
sparse

alias of pandas.core.arrays.sparse.accessor.SparseFrameAccessor

spatial

alias of arcgis.features.geo._accessor.GeoAccessor

property spatial_reference

The spatial reference of the geometry.

squeeze(axis=None)

Squeeze 1 dimensional axis objects into scalars.

Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.

This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

A specific axis to squeeze. By default, all length-1 axes are squeezed.

DataFrame, Series, or scalar

The projection after squeezing axis or all the axes.

Series.iloc : Integer-location based indexing for selecting scalars. DataFrame.iloc : Integer-location based indexing for selecting Series. Series.to_frame : Inverse of DataFrame.squeeze for a

single-column DataFrame.

>>> primes = pd.Series([2, 3, 5, 7])

Slicing might produce a Series with a single value:

>>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0    2
dtype: int64
>>> even_primes.squeeze()
2

Squeezing objects with more than one value in every axis does nothing:

>>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1    3
2    5
3    7
dtype: int64
>>> odd_primes.squeeze()
1    3
2    5
3    7
dtype: int64

Squeezing is even more effective when used with DataFrames.

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
>>> df
   a  b
0  1  2
1  3  4

Slicing a single column will produce a DataFrame with the columns having only one value:

>>> df_a = df[['a']]
>>> df_a
   a
0  1
1  3

So the columns can be squeezed down, resulting in a Series:

>>> df_a.squeeze('columns')
0    1
1    3
Name: a, dtype: int64

Slicing a single row from a single column will produce a single scalar DataFrame:

>>> df_0a = df.loc[df.index < 1, ['a']]
>>> df_0a
   a
0  1

Squeezing the rows produces a single scalar Series:

>>> df_0a.squeeze('rows')
a    1
Name: 0, dtype: int64

Squeezing all axes will project directly into a scalar:

>>> df_0a.squeeze()
1
stack(level=- 1, dropna=True)

Stack the prescribed level(s) from columns to index.

Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

  • if the columns have a single level, the output is a Series;

  • if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

levelint, str, list, default -1

Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.

dropnabool, default True

Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.

DataFrame or Series

Stacked dataframe or series.

DataFrame.unstackUnstack prescribed level(s) from index axis

onto column axis.

DataFrame.pivotReshape dataframe from long format to wide

format.

DataFrame.pivot_tableCreate a spreadsheet-style pivot table

as a DataFrame.

The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

Single level columns

>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
...                                     index=['cat', 'dog'],
...                                     columns=['weight', 'height'])

Stacking a dataframe with a single level column axis returns a Series:

>>> df_single_level_cols
     weight height
cat       0      1
dog       2      3
>>> df_single_level_cols.stack()
cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

Multi level columns: simple case

>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('weight', 'pounds')])
>>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol1)

Stacking a dataframe with a multi-level column axis:

>>> df_multi_level_cols1
     weight
         kg    pounds
cat       1        2
dog       2        4
>>> df_multi_level_cols1.stack()
            weight
cat kg           1
    pounds       2
dog kg           2
    pounds       4

Missing values

>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('height', 'm')])
>>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

>>> df_multi_level_cols2
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0
>>> df_multi_level_cols2.stack()
        height  weight
cat kg     NaN     1.0
    m      2.0     NaN
dog kg     NaN     3.0
    m      4.0     NaN

Prescribing the level(s) to be stacked

The first parameter controls which level or levels are stacked:

>>> df_multi_level_cols2.stack(0)
             kg    m
cat height  NaN  2.0
    weight  1.0  NaN
dog height  NaN  4.0
    weight  3.0  NaN
>>> df_multi_level_cols2.stack([0, 1])
cat  height  m     2.0
     weight  kg    1.0
dog  height  m     4.0
     weight  kg    3.0
dtype: float64

Dropping missing values

>>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:

>>> df_multi_level_cols3
    weight height
        kg      m
cat    NaN    1.0
dog    2.0    3.0
>>> df_multi_level_cols3.stack(dropna=False)
        height  weight
cat kg     NaN     NaN
    m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN
>>> df_multi_level_cols3.stack(dropna=True)
        height  weight
cat m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN
std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

axis : {index (0), columns (1)} skipna : bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Series or DataFrame (if level specified)

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

property style

Returns a Styler object.

Contains methods for building a styled HTML representation of the DataFrame.

io.formats.style.StylerHelps style a DataFrame or Series according to the

data with HTML and CSS.

sub(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
subtract(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

axis{index (0), columns (1)}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs

Additional keyword arguments to be passed to the function.

Series or DataFrame (if level specified)

Series.sum : Return the sum. Series.min : Return the minimum. Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. DataFrame.sum : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. DataFrame.idxmax : Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.sum()
14

Sum using level names, as well as indices.

>>> s.sum(level='blooded')
blooded
warm    6
cold    8
Name: legs, dtype: int64
>>> s.sum(level=0)
blooded
warm    6
cold    8
Name: legs, dtype: int64

By default, the sum of an empty or all-NA Series is 0.

>>> pd.Series([]).sum()  # min_count=0 is the default
0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

>>> pd.Series([]).sum(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum()
0.0
>>> pd.Series([np.nan]).sum(min_count=1)
nan
swapaxes(axis1, axis2, copy=True)FrameOrSeries

Interchange axes and swap values axes appropriately.

y : same as input

swaplevel(i=- 2, j=- 1, axis=0)pandas.core.frame.DataFrame

Swap levels i and j in a MultiIndex on a particular axis.

i, jint or str

Levels of the indices to be swapped. Can pass level name as string.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to swap levels on. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

DataFrame

symmetric_difference(second_geometry)

Constructs the geometry that is the union of two geometries minus the instersection of those geometries. The two input geometries must be the same shape type. Parameters:

second_geometry
  • a second geometry

tail(n: int = 5)FrameOrSeries

Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n, this function returns all rows except the first n rows, equivalent to df[n:].

nint, default 5

Number of rows to select.

type of caller

The last n rows of the caller object.

DataFrame.head : The first n rows of the caller object.

>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the last 5 lines

>>> df.tail()
   animal
4  monkey
5  parrot
6   shark
7   whale
8   zebra

Viewing the last n lines (three in this case)

>>> df.tail(3)
  animal
6  shark
7  whale
8  zebra

For negative values of n

>>> df.tail(-3)
   animal
3    lion
4  monkey
5  parrot
6   shark
7   whale
8   zebra
take(indices, axis=0, is_copy: Optional[bool] = None, **kwargs)FrameOrSeries

Return the elements in the given positional indices along an axis.

This means that we are not indexing according to actual values in the index attribute of the object. We are indexing according to the actual position of the element in the object.

indicesarray-like

An array of ints indicating which positions to take.

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

The axis on which to select elements. 0 means that we are selecting rows, 1 means that we are selecting columns.

is_copybool

Before pandas 1.0, is_copy=False can be specified to ensure that the return value is an actual copy. Starting with pandas 1.0, take always returns a copy, and the keyword is therefore deprecated.

Deprecated since version 1.0.0.

**kwargs

For compatibility with numpy.take(). Has no effect on the output.

takensame type as caller

An array-like containing the elements taken from the object.

DataFrame.loc : Select a subset of a DataFrame by labels. DataFrame.iloc : Select a subset of a DataFrame by positions. numpy.take : Take elements from an array along an axis.

>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=['name', 'class', 'max_speed'],
...                   index=[0, 2, 3, 1])
>>> df
     name   class  max_speed
0  falcon    bird      389.0
2  parrot    bird       24.0
3    lion  mammal       80.5
1  monkey  mammal        NaN

Take elements at positions 0 and 3 along the axis 0 (default).

Note how the actual indices selected (0 and 1) do not correspond to our selected indices 0 and 3. That’s because we are selecting the 0th and 3rd rows, not rows whose indices equal 0 and 3.

>>> df.take([0, 3])
     name   class  max_speed
0  falcon    bird      389.0
1  monkey  mammal        NaN

Take elements at indices 1 and 2 along the axis 1 (column selection).

>>> df.take([1, 2], axis=1)
    class  max_speed
0    bird      389.0
2    bird       24.0
3  mammal       80.5
1  mammal        NaN

We may take elements using negative integers for positive indices, starting from the end of the object, just like with Python lists.

>>> df.take([-1, -2])
     name   class  max_speed
1  monkey  mammal        NaN
3    lion  mammal       80.5
to_clipboard(excel: bool = True, sep: Optional[str] = None, **kwargs)None

Copy object to the system clipboard.

Write a text representation of object to the system clipboard. This can be pasted into Excel, for example.

excelbool, default True

Produce output in a csv format for easy pasting into excel.

  • True, use the provided separator for csv pasting.

  • False, write a string representation of the object to the clipboard.

sepstr, default '\t'

Field delimiter.

**kwargs

These parameters will be passed to DataFrame.to_csv.

DataFrame.to_csvWrite a DataFrame to a comma-separated values

(csv) file.

read_clipboard : Read text from clipboard and pass to read_table.

Requirements for your platform.

  • Linux : xclip, or xsel (with PyQt4 modules)

  • Windows : none

  • OS X : none

Copy the contents of a DataFrame to the clipboard.

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
>>> df.to_clipboard(sep=',')  
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6

We can omit the index by passing the keyword index and setting it to false.

>>> df.to_clipboard(sep=',', index=False)  
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6
to_csv(path_or_buf: Optional[Union[PathLike[str], str, IO[T], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap]] = None, sep: str = ',', na_rep: str = '', float_format: Optional[str] = None, columns: Optional[Sequence[Optional[Hashable]]] = None, header: Union[bool, List[str]] = True, index: bool = True, index_label: Union[Hashable, None, Sequence[Optional[Hashable]]] = None, mode: str = 'w', encoding: Optional[str] = None, compression: Optional[Union[str, Dict[str, Any]]] = 'infer', quoting: Optional[int] = None, quotechar: str = '"', line_terminator: Optional[str] = None, chunksize: Optional[int] = None, date_format: Optional[str] = None, doublequote: bool = True, escapechar: Optional[str] = None, decimal: str = '.', errors: str = 'strict', storage_options: Optional[Dict[str, Any]] = None)Optional[str]

Write object to a comma-separated values (csv) file.

Changed in version 0.24.0: The order of arguments for Series was changed.

path_or_bufstr or file handle, default None

File path or object, if None is provided the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.

Changed in version 0.24.0: Was previously named “path” for Series.

Changed in version 1.2.0: Support for binary file objects was introduced.

sepstr, default ‘,’

String of length 1. Field delimiter for the output file.

na_repstr, default ‘’

Missing data representation.

float_formatstr, default None

Format string for floating point numbers.

columnssequence, optional

Columns to write.

headerbool or list of str, default True

Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

Changed in version 0.24.0: Previously defaulted to False for Series.

indexbool, default True

Write row names (index).

index_labelstr or sequence, or False, default None

Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.

modestr

Python write mode, default ‘w’.

encodingstr, optional

A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.

compressionstr or dict, default ‘infer’

If str, represents compression mode. If dict, value at ‘method’ is the compression mode. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.

Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.

Changed in version 1.1.0: Passing compression options as keys in dict is supported for compression modes ‘gzip’ and ‘bz2’ as well as ‘zip’.

Changed in version 1.2.0: Compression is supported for binary file objects.

Changed in version 1.2.0: Previous versions forwarded dict entries for ‘gzip’ to gzip.open instead of gzip.GzipFile which prevented setting mtime.

quotingoptional constant from csv module

Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.

quotecharstr, default ‘"’

String of length 1. Character used to quote fields.

line_terminatorstr, optional

The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).

Changed in version 0.24.0.

chunksizeint or None

Rows to write at a time.

date_formatstr, default None

Format string for datetime objects.

doublequotebool, default True

Control quoting of quotechar inside a field.

escapecharstr, default None

String of length 1. Character used to escape sep and quotechar when appropriate.

decimalstr, default ‘.’

Character recognized as decimal separator. E.g. use ‘,’ for European data.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

New in version 1.1.0.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

None or str

If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

read_csv : Load a CSV file into a DataFrame. to_excel : Write DataFrame to an Excel file.

>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
...                    'mask': ['red', 'purple'],
...                    'weapon': ['sai', 'bo staff']})
>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'

Create ‘out.zip’ containing ‘out.csv’

>>> compression_opts = dict(method='zip',
...                         archive_name='out.csv')  
>>> df.to_csv('out.zip', index=False,
...           compression=compression_opts)  
to_dict(orient='dict', into=<class 'dict'>)

Convert the DataFrame to a dictionary.

The type of the key-value pairs can be customized with the parameters (see below).

orientstr {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}

Determines the type of the values of the dictionary.

  • ‘dict’ (default) : dict like {column -> {index -> value}}

  • ‘list’ : dict like {column -> [values]}

  • ‘series’ : dict like {column -> Series(values)}

  • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

  • ‘records’ : list like [{column -> value}, … , {column -> value}]

  • ‘index’ : dict like {index -> {column -> value}}

Abbreviations are allowed. s indicates series and sp indicates split.

intoclass, default dict

The collections.abc.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.

dict, list or collections.abc.Mapping

Return a collections.abc.Mapping object representing the DataFrame. The resulting transformation depends on the orient parameter.

DataFrame.from_dict: Create a DataFrame from a dictionary. DataFrame.to_json: Convert a DataFrame to JSON format.

>>> df = pd.DataFrame({'col1': [1, 2],
...                    'col2': [0.5, 0.75]},
...                   index=['row1', 'row2'])
>>> df
      col1  col2
row1     1  0.50
row2     2  0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}

You can specify the return orientation.

>>> df.to_dict('series')
{'col1': row1    1
         row2    2
Name: col1, dtype: int64,
'col2': row1    0.50
        row2    0.75
Name: col2, dtype: float64}
>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]]}
>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}

You can also specify the mapping type.

>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
             ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])

If you want a defaultdict, you need to initialize it:

>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
 defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
to_excel(excel_writer, sheet_name: str = 'Sheet1', na_rep: str = '', float_format: Optional[str] = None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options: Optional[Dict[str, Any]] = None)None

Write object to an Excel sheet.

To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to.

Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased.

excel_writerpath-like, file-like, or ExcelWriter object

File path or existing ExcelWriter.

sheet_namestr, default ‘Sheet1’

Name of sheet which will contain DataFrame.

na_repstr, default ‘’

Missing data representation.

float_formatstr, optional

Format string for floating point numbers. For example float_format="%.2f" will format 0.1234 to 0.12.

columnssequence or list of str, optional

Columns to write.

headerbool or list of str, default True

Write out the column names. If a list of string is given it is assumed to be aliases for the column names.

indexbool, default True

Write row names (index).

index_labelstr or sequence, optional

Column label for index column(s) if desired. If not specified, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

startrowint, default 0

Upper left cell row to dump data frame.

startcolint, default 0

Upper left cell column to dump data frame.

enginestr, optional

Write engine to use, ‘openpyxl’ or ‘xlsxwriter’. You can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.

Deprecated since version 1.2.0: As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas.

merge_cellsbool, default True

Write MultiIndex and Hierarchical Rows as merged cells.

encodingstr, optional

Encoding of the resulting excel file. Only necessary for xlwt, other writers support unicode natively.

inf_repstr, default ‘inf’

Representation for infinity (there is no native representation for infinity in Excel).

verbosebool, default True

Display more information in the error logs.

freeze_panestuple of int (length 2), optional

Specifies the one-based bottommost row and rightmost column that is to be frozen.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

to_csv : Write DataFrame to a comma-separated values (csv) file. ExcelWriter : Class for writing DataFrame objects into excel sheets. read_excel : Read an Excel file into a pandas DataFrame. read_csv : Read a comma-separated values (csv) file into DataFrame.

For compatibility with to_csv(), to_excel serializes lists and dicts to strings before writing.

Once a workbook has been saved it is not possible write further data without rewriting the whole workbook.

Create, write to and save a workbook:

>>> df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
...                    index=['row 1', 'row 2'],
...                    columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx")  

To specify the sheet name:

>>> df1.to_excel("output.xlsx",
...              sheet_name='Sheet_name_1')  

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> df2 = df1.copy()
>>> with pd.ExcelWriter('output.xlsx') as writer:  
...     df1.to_excel(writer, sheet_name='Sheet_name_1')
...     df2.to_excel(writer, sheet_name='Sheet_name_2')

ExcelWriter can also be used to append to an existing Excel file:

>>> with pd.ExcelWriter('output.xlsx',
...                     mode='a') as writer:  
...     df.to_excel(writer, sheet_name='Sheet_name_3')

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel('output1.xlsx', engine='xlsxwriter')  
to_feather(path: Union[PathLike[str], str, IO, io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap], **kwargs)None

Write a DataFrame to the binary Feather format.

pathstr or file-like object

If a string, it will be used as Root Directory path.

**kwargs :

Additional keywords passed to pyarrow.feather.write_feather(). Starting with pyarrow 0.17, this includes the compression, compression_level, chunksize and version keywords.

New in version 1.1.0.

to_feature_collection(name=None, drawing_info=None, extent=None, global_id_field=None)

converts a Spatial DataFrame to a Feature Collection

optional argument

Description

name

optional string. Name of the Feature Collection

drawing_info

Optional dictionary. This is the rendering information for a Feature Collection. Rendering information is a dictionary with the symbology, labelling and other properties defined. See: http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Renderer_objects/02r30000019t000000/

extent

Optional dictionary. If desired, a custom extent can be provided to set where the map starts up when showing the data. The default is the full extent of the dataset in the Spatial DataFrame.

global_id_field

Optional string. The Global ID field of the dataset.

Returns

FeatureCollection object

to_featureclass(out_location, out_name, overwrite=True, skip_invalid=True)

converts a SpatialDataFrame to a feature class

Argument

Description

out_location

Required string. A save location workspace

out_name

Required string. The name of the feature class to save as

overwrite

Optional boolean. True means to erase and replace value, false means to append

skip_invalids

Optional boolean. If True, any bad rows will be ignored.

Returns

string

to_featurelayer(title, gis=None, tags=None)

publishes a spatial dataframe to a new feature layer

Argument

Description

title

Required string. The name of the service

gis

Optional GIS. The GIS connection object

tags

Optional string. A comma seperated list of descriptive words for the service

Returns

FeatureLayer

to_featureset()

Converts a spatial dataframe to a feature set object

to_gbq(destination_table, project_id=None, chunksize=None, reauth=False, if_exists='fail', auth_local_webserver=False, table_schema=None, location=None, progress_bar=True, credentials=None)None

Write a DataFrame to a Google BigQuery table.

This function requires the pandas-gbq package.

See the How to authenticate with Google BigQuery guide for authentication instructions.

destination_tablestr

Name of table to be written, in the form dataset.tablename.

project_idstr, optional

Google BigQuery Account project ID. Optional when available from the environment.

chunksizeint, optional

Number of rows to be inserted in each chunk from the dataframe. Set to None to load the whole dataframe at once.

reauthbool, default False

Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.

if_existsstr, default ‘fail’

Behavior when the destination table exists. Value can be one of:

'fail'

If table exists raise pandas_gbq.gbq.TableCreationError.

'replace'

If table exists, drop it, recreate it, and insert data.

'append'

If table exists, insert data. Create if does not exist.

auth_local_webserverbool, default False

Use the local webserver flow instead of the console flow when getting user credentials.

New in version 0.2.0 of pandas-gbq.

table_schemalist of dicts, optional

List of BigQuery table fields to which according DataFrame columns conform to, e.g. [{'name': 'col1', 'type': 'STRING'},...]. If schema is not provided, it will be generated according to dtypes of DataFrame columns. See BigQuery API documentation on available names of a field.

New in version 0.3.1 of pandas-gbq.

locationstr, optional

Location where the load job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of the target dataset.

New in version 0.5.0 of pandas-gbq.

progress_barbool, default True

Use the library tqdm to show the progress bar for the upload, chunk by chunk.

New in version 0.5.0 of pandas-gbq.

credentialsgoogle.auth.credentials.Credentials, optional

Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine google.auth.compute_engine.Credentials or Service Account google.oauth2.service_account.Credentials directly.

New in version 0.8.0 of pandas-gbq.

New in version 0.24.0.

pandas_gbq.to_gbq : This function in the pandas-gbq library. read_gbq : Read a DataFrame from Google BigQuery.

to_hdf(path_or_buf, key, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

path_or_buf : the path (string) or HDFStore object key : string

indentifier for the group in the store

mode : optional, {‘a’, ‘w’, ‘r+’}, default ‘a’

'w'

Write; a new file is created (an existing file with the same name would be deleted).

'a'

Append; an existing file is opened for reading and writing, and if the file does not exist it is created.

'r+'

It is similar to 'a', but the file must already exist.

format‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f)Fixed format

Fast writing/reading. Not-appendable, nor searchable

table(t)Table format

Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data

appendboolean, default False

For Table formats, append the input data to the existing

data_columnslist of columns, or True, default None

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See here.

Applicable only to format=’table’.

complevelint, 1-9, default 0

If a complib is specified compression will be applied where possible

complib{‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None

If complevel is > 0 apply compression to objects written in the store wherever possible

fletcher32bool, default False

If applying compression use the fletcher32 checksum

dropnaboolean, default False.

If true, ALL nan rows will not be written to store.

to_html(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)

Render a DataFrame as an HTML table.

bufstr, Path or StringIO-like, optional, default None

Buffer to write to. If None, the output is returned as a string.

columnssequence, optional, default None

The subset of columns to write. Writes all columns by default.

col_spacestr or int, list or dict of int or str, optional

The minimum width of each column in CSS length units. An int is assumed to be px units.

New in version 0.25.0: Ability to use str.

headerbool, optional

Whether to print column labels, default True.

indexbool, optional, default True

Whether to print index (row) labels.

na_repstr, optional, default ‘NaN’

String representation of NaN to use.

formatterslist, tuple or dict of one-param. functions, optional

Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.

float_formatone-parameter function, optional, default None

Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.

Changed in version 1.2.0.

sparsifybool, optional, default True

Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.

index_namesbool, optional, default True

Prints the names of the indexes.

justifystr, default None

How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are

  • left

  • right

  • center

  • justify

  • justify-all

  • start

  • end

  • inherit

  • match-parent

  • initial

  • unset.

max_rowsint, optional

Maximum number of rows to display in the console.

min_rowsint, optional

The number of rows to display in the console in a truncated repr (when number of rows is above max_rows).

max_colsint, optional

Maximum number of columns to display in the console.

show_dimensionsbool, default False

Display DataFrame dimensions (number of rows by number of columns).

decimalstr, default ‘.’

Character recognized as decimal separator, e.g. ‘,’ in Europe.

bold_rowsbool, default True

Make the row labels bold in the output.

classesstr or list or tuple, default None

CSS class(es) to apply to the resulting html table.

escapebool, default True

Convert the characters <, >, and & to HTML-safe sequences.

notebook{True, False}, default False

Whether the generated HTML is for IPython Notebook.

borderint

A border=border attribute is included in the opening <table> tag. Default pd.options.display.html.border.

encodingstr, default “utf-8”

Set character encoding.

New in version 1.0.

table_idstr, optional

A css id is included in the opening <table> tag if specified.

render_linksbool, default False

Convert URLs to HTML links.

New in version 0.24.0.

str or None

If buf is None, returns the result as a string. Otherwise returns None.

to_string : Convert DataFrame to a string.

to_json(path_or_buf: Optional[Union[PathLike[str], str, IO[T], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap]] = None, orient: Optional[str] = None, date_format: Optional[str] = None, double_precision: int = 10, force_ascii: bool = True, date_unit: str = 'ms', default_handler: Optional[Callable[[Any], Optional[Union[str, int, float, bool, List, Dict]]]] = None, lines: bool = False, compression: Optional[Union[str, Dict[str, Any]]] = 'infer', index: bool = True, indent: Optional[int] = None, storage_options: Optional[Dict[str, Any]] = None)Optional[str]

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

path_or_bufstr or file handle, optional

File path or object. If not specified, the result is returned as a string.

orientstr

Indication of expected JSON string format.

  • Series:

    • default is ‘index’

    • allowed values are: {‘split’, ‘records’, ‘index’, ‘table’}.

  • DataFrame:

    • default is ‘columns’

    • allowed values are: {‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’}.

  • The format of the JSON string:

    • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

    • ‘records’ : list like [{column -> value}, … , {column -> value}]

    • ‘index’ : dict like {index -> {column -> value}}

    • ‘columns’ : dict like {column -> {index -> value}}

    • ‘values’ : just the values array

    • ‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}}

    Describing the data, where data component is like orient='records'.

date_format{None, ‘epoch’, ‘iso’}

Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.

double_precisionint, default 10

The number of decimal places to use when encoding floating point values.

force_asciibool, default True

Force encoded string to be ASCII.

date_unitstr, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.

default_handlercallable, default None

Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

linesbool, default False

If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.

compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}

A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.

Changed in version 0.24.0: ‘infer’ option added and set to default

indexbool, default True

Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’.

indentint, optional

Length of whitespace used to indent each record.

New in version 1.0.0.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

None or str

If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None.

read_json : Convert a JSON string to pandas object.

The behavior of indent=0 varies from the stdlib, which does not indent the output but does insert newlines. Currently, indent=0 and the default indent=None are equivalent in pandas, though this may change in a future release.

orient='table' contains a ‘pandas_version’ field under ‘schema’. This stores the version of pandas used in the latest revision of the schema.

>>> import json
>>> df = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )
>>> result = df.to_json(orient="split")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
    "columns": [
        "col 1",
        "col 2"
    ],
    "index": [
        "row 1",
        "row 2"
    ],
    "data": [
        [
            "a",
            "b"
        ],
        [
            "c",
            "d"
        ]
    ]
}

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

>>> result = df.to_json(orient="records")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
[
    {
        "col 1": "a",
        "col 2": "b"
    },
    {
        "col 1": "c",
        "col 2": "d"
    }
]

Encoding/decoding a Dataframe using 'index' formatted JSON:

>>> result = df.to_json(orient="index")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
    "row 1": {
        "col 1": "a",
        "col 2": "b"
    },
    "row 2": {
        "col 1": "c",
        "col 2": "d"
    }
}

Encoding/decoding a Dataframe using 'columns' formatted JSON:

>>> result = df.to_json(orient="columns")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
    "col 1": {
        "row 1": "a",
        "row 2": "c"
    },
    "col 2": {
        "row 1": "b",
        "row 2": "d"
    }
}

Encoding/decoding a Dataframe using 'values' formatted JSON:

>>> result = df.to_json(orient="values")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
[
    [
        "a",
        "b"
    ],
    [
        "c",
        "d"
    ]
]

Encoding with Table Schema:

>>> result = df.to_json(orient="table")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
    "schema": {
        "fields": [
            {
                "name": "index",
                "type": "string"
            },
            {
                "name": "col 1",
                "type": "string"
            },
            {
                "name": "col 2",
                "type": "string"
            }
        ],
        "primaryKey": [
            "index"
        ],
        "pandas_version": "0.20.0"
    },
    "data": [
        {
            "index": "row 1",
            "col 1": "a",
            "col 2": "b"
        },
        {
            "index": "row 2",
            "col 1": "c",
            "col 2": "d"
        }
    ]
}
to_latex(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)

Render object to a LaTeX tabular, longtable, or nested table/tabular.

Requires \usepackage{booktabs}. The output can be copy/pasted into a main LaTeX document or read from an external file with \input{table.tex}.

Changed in version 1.0.0: Added caption and label arguments.

Changed in version 1.2.0: Added position argument, changed meaning of caption argument.

bufstr, Path or StringIO-like, optional, default None

Buffer to write to. If None, the output is returned as a string.

columnslist of label, optional

The subset of columns to write. Writes all columns by default.

col_spaceint, optional

The minimum width of each column.

headerbool or list of str, default True

Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.

indexbool, default True

Write row names (index).

na_repstr, default ‘NaN’

Missing data representation.

formatterslist of functions or dict of {str: function}, optional

Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List must be of length equal to the number of columns.

float_formatone-parameter function or str, optional, default None

Formatter for floating point numbers. For example float_format="%.2f" and float_format="{:0.2f}".format will both result in 0.1234 being formatted as 0.12.

sparsifybool, optional

Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row. By default, the value will be read from the config module.

index_namesbool, default True

Prints the names of the indexes.

bold_rowsbool, default False

Make the row labels bold in the output.

column_formatstr, optional

The columns format as specified in LaTeX table format e.g. ‘rcl’ for 3 columns. By default, ‘l’ will be used for all columns except columns of numbers, which default to ‘r’.

longtablebool, optional

By default, the value will be read from the pandas config module. Use a longtable environment instead of tabular. Requires adding a usepackage{longtable} to your LaTeX preamble.

escapebool, optional

By default, the value will be read from the pandas config module. When set to False prevents from escaping latex special characters in column names.

encodingstr, optional

A string representing the encoding to use in the output file, defaults to ‘utf-8’.

decimalstr, default ‘.’

Character recognized as decimal separator, e.g. ‘,’ in Europe.

multicolumnbool, default True

Use multicolumn to enhance MultiIndex columns. The default will be read from the config module.

multicolumn_formatstr, default ‘l’

The alignment for multicolumns, similar to column_format The default will be read from the config module.

multirowbool, default False

Use multirow to enhance MultiIndex rows. Requires adding a usepackage{multirow} to your LaTeX preamble. Will print centered labels (instead of top-aligned) across the contained rows, separating groups via clines. The default will be read from the pandas config module.

captionstr or tuple, optional

Tuple (full_caption, short_caption), which results in \caption[short_caption]{full_caption}; if a single string is passed, no short caption will be set.

New in version 1.0.0.

Changed in version 1.2.0: Optionally allow caption to be a tuple (full_caption, short_caption).

labelstr, optional

The LaTeX label to be placed inside \label{} in the output. This is used with \ref{} in the main .tex file.

New in version 1.0.0.

positionstr, optional

The LaTeX positional argument for tables, to be placed after \begin{} in the output.

str or None

If buf is None, returns the result as a string. Otherwise returns None.

DataFrame.to_stringRender a DataFrame to a console-friendly

tabular output.

DataFrame.to_html : Render a DataFrame as an HTML table.

>>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
...                   mask=['red', 'purple'],
...                   weapon=['sai', 'bo staff']))
>>> print(df.to_latex(index=False))  
\begin{tabular}{lll}
 \toprule
       name &    mask &    weapon \\
 \midrule
    Raphael &     red &       sai \\
  Donatello &  purple &  bo staff \\
\bottomrule
\end{tabular}
to_markdown(buf: Optional[Union[IO[str], str]] = None, mode: str = 'wt', index: bool = True, storage_options: Optional[Dict[str, Any]] = None, **kwargs)Optional[str]

Print DataFrame in Markdown-friendly format.

New in version 1.0.0.

bufstr, Path or StringIO-like, optional, default None

Buffer to write to. If None, the output is returned as a string.

modestr, optional

Mode in which file is opened, “wt” by default.

indexbool, optional, default True

Add index (row) labels.

New in version 1.1.0.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

**kwargs

These parameters will be passed to tabulate.

str

DataFrame in Markdown-friendly format.

Requires the tabulate package.

>>> s = pd.Series(["elk", "pig", "dog", "quetzal"], name="animal")
>>> print(s.to_markdown())
|    | animal   |
|---:|:---------|
|  0 | elk      |
|  1 | pig      |
|  2 | dog      |
|  3 | quetzal  |

Output markdown with a tabulate option.

>>> print(s.to_markdown(tablefmt="grid"))
+----+----------+
|    | animal   |
+====+==========+
|  0 | elk      |
+----+----------+
|  1 | pig      |
+----+----------+
|  2 | dog      |
+----+----------+
|  3 | quetzal  |
+----+----------+
to_numpy(dtype=None, copy: bool = False, na_value=<object object>)numpy.ndarray

Convert the DataFrame to a NumPy array.

New in version 0.24.0.

By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32, the results dtype will be float32. This may require copying data and coercing values, which may be expensive.

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default False

Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_valueAny, optional

The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

New in version 1.1.0.

numpy.ndarray

Series.to_numpy : Similar method for Series.

>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
       [2, 4]])

With heterogeneous data, the lowest common type will have to be used.

>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
       [2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will have object dtype.

>>> df['C'] = pd.date_range('2000', periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)
to_parquet(path: Optional[Union[PathLike[str], str, IO[T], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap]] = None, engine: str = 'auto', compression: Optional[str] = 'snappy', index: Optional[bool] = None, partition_cols: Optional[List[str]] = None, storage_options: Optional[Dict[str, Any]] = None, **kwargs)Optional[bytes]

Write a DataFrame to the binary parquet format.

This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.

pathstr or file-like object, default None

If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function) or io.BytesIO. The engine fastparquet does not accept file-like objects. If path is None, a bytes object is returned.

Changed in version 1.2.0.

Previously this was “fname”

engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’

Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’

Name of the compression to use. Use None for no compression.

indexbool, default None

If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to True the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

New in version 0.24.0.

partition_colslist, optional, default None

Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.

New in version 0.24.0.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

**kwargs

Additional arguments passed to the parquet library. See pandas io for more details.

bytes if no path argument is provided else None

read_parquet : Read a parquet file. DataFrame.to_csv : Write a csv file. DataFrame.to_sql : Write to a sql table. DataFrame.to_hdf : Write to hdf.

This function requires either the fastparquet or pyarrow library.

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
...               compression='gzip')  
>>> pd.read_parquet('df.parquet.gzip')  
   col1  col2
0     1     3
1     2     4

If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don’t use partition_cols, which creates multiple files.

>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()
to_period(freq=None, axis: Union[str, int] = 0, copy: bool = True)pandas.core.frame.DataFrame

Convert DataFrame from DatetimeIndex to PeriodIndex.

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed).

freqstr, default

Frequency of the PeriodIndex.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to convert (the index by default).

copybool, default True

If False then underlying input data is not copied.

DataFrame with PeriodIndex

to_pickle(path, compression: Optional[Union[str, Dict[str, Any]]] = 'infer', protocol: int = 4, storage_options: Optional[Dict[str, Any]] = None)None

Pickle (serialize) object to file.

pathstr

File path where the pickled object will be stored.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’

A string representing the compression to use in the output file. By default, infers from the file extension in specified path. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is ‘zip’ or inferred as ‘zip’, other entries passed as additional compression options.

protocolint

Int which indicates which protocol should be used by the pickler, default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible values are 0, 1, 2, 3, 4, 5. A negative value for the protocol parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

1

https://docs.python.org/3/library/pickle.html.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

read_pickle : Load pickled pandas object (or any object) from file. DataFrame.to_hdf : Write DataFrame to an HDF5 file. DataFrame.to_sql : Write DataFrame to a SQL database. DataFrame.to_parquet : Write a DataFrame to the binary parquet format.

>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> original_df.to_pickle("./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> import os
>>> os.remove("./dummy.pkl")
to_records(index=True, column_dtypes=None, index_dtypes=None)numpy.recarray

Convert DataFrame to a NumPy record array.

Index will be included as the first field of the record array if requested.

indexbool, default True

Include index in resulting record array, stored in ‘index’ field or using the index label, if set.

column_dtypesstr, type, dict, default None

New in version 0.24.0.

If a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types.

index_dtypesstr, type, dict, default None

New in version 0.24.0.

If a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types.

This mapping is applied only if index=True.

numpy.recarray

NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries.

DataFrame.from_records: Convert structured or record ndarray

to DataFrame.

numpy.recarray: An ndarray that allows field access using

attributes, analogous to typed columns in a spreadsheet.

>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
...                   index=['a', 'b'])
>>> df
   A     B
a  1  0.50
b  2  0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

If the DataFrame index has no label then the recarray field name is set to ‘index’. If the index has a label then this is used as the field name:

>>> df.index = df.index.rename("I")
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('I', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:

>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
          dtype=[('A', '<i8'), ('B', '<f8')])

Data types can be specified for the columns:

>>> df.to_records(column_dtypes={"A": "int32"})
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])

As well as for the index:

>>> df.to_records(index_dtypes="<S2")
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('I', 'S2'), ('A', '<i8'), ('B', '<f8')])
>>> index_dtypes = f"<S{df.index.str.len().max()}"
>>> df.to_records(index_dtypes=index_dtypes)
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('I', 'S1'), ('A', '<i8'), ('B', '<f8')])
to_sql(name: str, con, schema=None, if_exists: str = 'fail', index: bool = True, index_label=None, chunksize=None, dtype=None, method=None)None

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1]_ are supported. Tables can be newly created, appended to, or overwritten.

namestr

Name of SQL table.

consqlalchemy.engine.(Engine or Connection) or sqlite3.Connection

Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable See here.

schemastr, optional

Specify the schema (if database flavor supports this). If None, use default schema.

if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’

How to behave if the table already exists.

  • fail: Raise a ValueError.

  • replace: Drop the table before inserting new values.

  • append: Insert new values to the existing table.

indexbool, default True

Write DataFrame index as a column. Uses index_label as the column name in the table.

index_labelstr or sequence, default None

Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

chunksizeint, optional

Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.

dtypedict or scalar, optional

Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.

method{None, ‘multi’, callable}, optional

Controls the SQL insertion clause used:

  • None : Uses standard SQL INSERT clause (one per row).

  • ‘multi’: Pass multiple values in a single INSERT clause.

  • callable with signature (pd_table, conn, keys, data_iter).

Details and a sample callable implementation can be found in the section insert method.

New in version 0.24.0.

ValueError

When the table already exists and if_exists is ‘fail’ (the default).

read_sql : Read a DataFrame from a table.

Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.

New in version 0.24.0.

1

https://docs.sqlalchemy.org

2

https://www.python.org/dev/peps/pep-0249/

Create an in-memory SQLite database.

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)

Create a table from scratch with 3 rows.

>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> df
     name
0  User 1
1  User 2
2  User 3
>>> df.to_sql('users', con=engine)
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]

An sqlalchemy.engine.Connection can also be passed to con:

>>> with engine.begin() as connection:
...     df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
...     df1.to_sql('users', con=connection, if_exists='append')

This is allowed to support operations that require that the same DBAPI connection is used for the entire operation.

>>> df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
>>> df2.to_sql('users', con=engine, if_exists='append')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
 (0, 'User 4'), (1, 'User 5'), (0, 'User 6'),
 (1, 'User 7')]

Overwrite the table with just df2.

>>> df2.to_sql('users', con=engine, if_exists='replace',
...            index_label='id')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 6'), (1, 'User 7')]

Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars.

>>> df = pd.DataFrame({"A": [1, None, 2]})
>>> df
     A
0  1.0
1  NaN
2  2.0
>>> from sqlalchemy.types import Integer
>>> df.to_sql('integers', con=engine, index=False,
...           dtype={"A": Integer()})
>>> engine.execute("SELECT * FROM integers").fetchall()
[(1,), (None,), (2,)]
to_stata(path: Union[PathLike[str], str, IO[T], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap], convert_dates: Optional[Dict[Optional[Hashable], str]] = None, write_index: bool = True, byteorder: Optional[str] = None, time_stamp: Optional[datetime.datetime] = None, data_label: Optional[str] = None, variable_labels: Optional[Dict[Optional[Hashable], str]] = None, version: Optional[int] = 114, convert_strl: Optional[Sequence[Optional[Hashable]]] = None, compression: Optional[Union[str, Dict[str, Any]]] = 'infer', storage_options: Optional[Dict[str, Any]] = None)None

Export DataFrame object to Stata dta format.

Writes the DataFrame to a Stata dataset file. “dta” files contain a Stata dataset.

pathstr, buffer or path object

String, path object (pathlib.Path or py._path.local.LocalPath) or object implementing a binary write() function. If using a buffer then the buffer will not be automatically closed after the file data has been written.

Changed in version 1.0.0.

Previously this was “fname”

convert_datesdict

Dictionary mapping columns containing datetime types to stata internal format to use when writing the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either an integer or a name. Datetime columns that do not have a conversion type specified will be converted to ‘tc’. Raises NotImplementedError if a datetime column has timezone information.

write_indexbool

Write the index to Stata dataset.

byteorderstr

Can be “>”, “<”, “little”, or “big”. default is sys.byteorder.

time_stampdatetime

A datetime to use as file creation date. Default is the current time.

data_labelstr, optional

A label for the data set. Must be 80 characters or smaller.

variable_labelsdict

Dictionary containing columns as keys and variable labels as values. Each label must be 80 characters or smaller.

version{114, 117, 118, 119, None}, default 114

Version to use in the output dta file. Set to None to let pandas decide between 118 or 119 formats depending on the number of columns in the frame. Version 114 can be read by Stata 10 and later. Version 117 can be read by Stata 13 or later. Version 118 is supported in Stata 14 and later. Version 119 is supported in Stata 15 and later. Version 114 limits string variables to 244 characters or fewer while versions 117 and later allow strings with lengths up to 2,000,000 characters. Versions 118 and 119 support Unicode characters, and version 119 supports more than 32,767 variables.

Version 119 should usually only be used when the number of variables exceeds the capacity of dta format 118. Exporting smaller datasets in format 119 may have unintended consequences, and, as of November 2020, Stata SE cannot read version 119 files.

Changed in version 1.0.0: Added support for formats 118 and 119.

convert_strllist, optional

List of column names to convert to string columns to Stata StrL format. Only available if version is 117. Storing strings in the StrL format can produce smaller dta files if strings have more than 8 characters and values are repeated.

compressionstr or dict, default ‘infer’

For on-the-fly compression of the output dta. If string, specifies compression mode. If dict, value at key ‘method’ specifies compression mode. Compression mode must be one of {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and fname is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no compression). If dict and compression mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.

New in version 1.1.0.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

NotImplementedError
  • If datetimes contain timezone information

  • Column dtype is not representable in Stata

ValueError
  • Columns listed in convert_dates are neither datetime64[ns] or datetime.datetime

  • Column listed in convert_dates is not in DataFrame

  • Categorical label contains more than 32,000 characters

read_stata : Import Stata data files. io.stata.StataWriter : Low-level writer for Stata data files. io.stata.StataWriter117 : Low-level writer for version 117 files.

>>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon',
...                               'parrot'],
...                    'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta')  
to_string(buf: Optional[Union[PathLike[str], str, IO[str], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap]] = None, columns: Optional[Sequence[str]] = None, col_space: Optional[int] = None, header: Union[bool, Sequence[str]] = True, index: bool = True, na_rep: str = 'NaN', formatters: Optional[Union[List[Callable], Tuple[Callable, ], Mapping[Union[str, int], Callable]]] = None, float_format: Optional[Union[str, Callable, EngFormatter]] = None, sparsify: Optional[bool] = None, index_names: bool = True, justify: Optional[str] = None, max_rows: Optional[int] = None, min_rows: Optional[int] = None, max_cols: Optional[int] = None, show_dimensions: bool = False, decimal: str = '.', line_width: Optional[int] = None, max_colwidth: Optional[int] = None, encoding: Optional[str] = None)Optional[str]

Render a DataFrame to a console-friendly tabular output.

bufstr, Path or StringIO-like, optional, default None

Buffer to write to. If None, the output is returned as a string.

columnssequence, optional, default None

The subset of columns to write. Writes all columns by default.

col_spaceint, list or dict of int, optional

The minimum width of each column.

headerbool or sequence, optional

Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.

indexbool, optional, default True

Whether to print index (row) labels.

na_repstr, optional, default ‘NaN’

String representation of NaN to use.

formatterslist, tuple or dict of one-param. functions, optional

Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.

float_formatone-parameter function, optional, default None

Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.

Changed in version 1.2.0.

sparsifybool, optional, default True

Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.

index_namesbool, optional, default True

Prints the names of the indexes.

justifystr, default None

How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are

  • left

  • right

  • center

  • justify

  • justify-all

  • start

  • end

  • inherit

  • match-parent

  • initial

  • unset.

max_rowsint, optional

Maximum number of rows to display in the console.

min_rowsint, optional

The number of rows to display in the console in a truncated repr (when number of rows is above max_rows).

max_colsint, optional

Maximum number of columns to display in the console.

show_dimensionsbool, default False

Display DataFrame dimensions (number of rows by number of columns).

decimalstr, default ‘.’

Character recognized as decimal separator, e.g. ‘,’ in Europe.

line_widthint, optional

Width to wrap a line in characters.

max_colwidthint, optional

Max width to truncate each column in characters. By default, no limit.

New in version 1.0.0.

encodingstr, default “utf-8”

Set character encoding.

New in version 1.0.

str or None

If buf is None, returns the result as a string. Otherwise returns None.

to_html : Convert DataFrame to HTML.

>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
   col1  col2
0     1     4
1     2     5
2     3     6
to_timestamp(freq=None, how: str = 'start', axis: Union[str, int] = 0, copy: bool = True)pandas.core.frame.DataFrame

Cast to DatetimeIndex of timestamps, at beginning of period.

freqstr, default frequency of PeriodIndex

Desired frequency.

how{‘s’, ‘e’, ‘start’, ‘end’}

Convention for converting period to timestamp; start of period vs. end.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to convert (the index by default).

copybool, default True

If False then underlying input data is not copied.

DataFrame with DatetimeIndex

to_xarray()

Return an xarray object from the pandas object.

xarray.DataArray or xarray.Dataset

Data in the pandas structure converted to Dataset if the object is a DataFrame, or a DataArray if the object is a Series.

DataFrame.to_hdf : Write DataFrame to an HDF5 file. DataFrame.to_parquet : Write a DataFrame to the binary parquet format.

See the xarray docs

>>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
...                    ('parrot', 'bird', 24.0, 2),
...                    ('lion', 'mammal', 80.5, 4),
...                    ('monkey', 'mammal', np.nan, 4)],
...                   columns=['name', 'class', 'max_speed',
...                            'num_legs'])
>>> df
     name   class  max_speed  num_legs
0  falcon    bird      389.0         2
1  parrot    bird       24.0         2
2    lion  mammal       80.5         4
3  monkey  mammal        NaN         4
>>> df.to_xarray()
<xarray.Dataset>
Dimensions:    (index: 4)
Coordinates:
  * index      (index) int64 0 1 2 3
Data variables:
    name       (index) object 'falcon' 'parrot' 'lion' 'monkey'
    class      (index) object 'bird' 'bird' 'mammal' 'mammal'
    max_speed  (index) float64 389.0 24.0 80.5 nan
    num_legs   (index) int64 2 2 4 4
>>> df['max_speed'].to_xarray()
<xarray.DataArray 'max_speed' (index: 4)>
array([389. ,  24. ,  80.5,   nan])
Coordinates:
  * index    (index) int64 0 1 2 3
>>> dates = pd.to_datetime(['2018-01-01', '2018-01-01',
...                         '2018-01-02', '2018-01-02'])
>>> df_multiindex = pd.DataFrame({'date': dates,
...                               'animal': ['falcon', 'parrot',
...                                          'falcon', 'parrot'],
...                               'speed': [350, 18, 361, 15]})
>>> df_multiindex = df_multiindex.set_index(['date', 'animal'])
>>> df_multiindex
                   speed
date       animal
2018-01-01 falcon    350
           parrot     18
2018-01-02 falcon    361
           parrot     15
>>> df_multiindex.to_xarray()
<xarray.Dataset>
Dimensions:  (animal: 2, date: 2)
Coordinates:
  * date     (date) datetime64[ns] 2018-01-01 2018-01-02
  * animal   (animal) object 'falcon' 'parrot'
Data variables:
    speed    (date, animal) int64 350 18 361 15
touches(second_geometry)

Indicates if the boundaries of the geometries intersect.

Paramters:
second_geometry
  • a second geometry

transform(func: Union[Callable, str, List[Union[Callable, str]], Dict[Optional[Hashable], Union[Callable, str, List[Union[Callable, str]]]]], axis: Union[str, int] = 0, *args, **kwargs)pandas.core.frame.DataFrame

Call func on self producing a DataFrame with transformed values.

Produced DataFrame will have same axis length as self.

funcfunction, str, list-like or dict-like

Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.

Accepted combinations are:

  • function

  • string function name

  • list-like of functions and/or function names, e.g. [np.exp, 'sqrt']

  • dict-like of axis labels -> functions, function names or list-like of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args

Positional arguments to pass to func.

**kwargs

Keyword arguments to pass to func.

DataFrame

A DataFrame that must have the same length as self.

ValueError : If the returned DataFrame has a different length than self.

DataFrame.agg : Only perform aggregating type operations. DataFrame.apply : Invoke function on a DataFrame.

>>> df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df
   A  B
0  0  1
1  1  2
2  2  3
>>> df.transform(lambda x: x + 1)
   A  B
0  1  2
1  2  3
2  3  4

Even though the resulting DataFrame must have the same length as the input DataFrame, it is possible to provide several input functions:

>>> s = pd.Series(range(3))
>>> s
0    0
1    1
2    2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
       sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056

You can call transform on a GroupBy object:

>>> df = pd.DataFrame({
...     "Date": [
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
...     "Data": [5, 8, 6, 1, 50, 100, 60, 120],
... })
>>> df
         Date  Data
0  2015-05-08     5
1  2015-05-07     8
2  2015-05-06     6
3  2015-05-05     1
4  2015-05-08    50
5  2015-05-07   100
6  2015-05-06    60
7  2015-05-05   120
>>> df.groupby('Date')['Data'].transform('sum')
0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64
>>> df = pd.DataFrame({
...     "c": [1, 1, 1, 2, 2, 2, 2],
...     "type": ["m", "n", "o", "m", "m", "n", "n"]
... })
>>> df
   c type
0  1    m
1  1    n
2  1    o
3  2    m
4  2    m
5  2    n
6  2    n
>>> df['size'] = df.groupby('c')['type'].transform(len)
>>> df
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4
transpose(*args, copy: bool = False)pandas.core.frame.DataFrame

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

*argstuple, optional

Accepted for compatibility with NumPy.

copybool, default False

Whether to copy the data after transposing, even for DataFrames with a single dtype.

Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.

DataFrame

The transposed DataFrame.

numpy.transpose : Permute the dimensions of a given array.

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.

Square DataFrame with homogeneous dtype

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
   col1  col2
0     1     3
1     2     4
>>> df1_transposed = df1.T # or df1.transpose()
>>> df1_transposed
      0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

>>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

>>> d2 = {'name': ['Alice', 'Bob'],
...       'score': [9.5, 8],
...       'employed': [False, True],
...       'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
    name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0
>>> df2_transposed = df2.T # or df2.transpose()
>>> df2_transposed
              0     1
name      Alice   Bob
score       9.5   8.0
employed  False  True
kids          0     0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

>>> df2.dtypes
name         object
score       float64
employed       bool
kids          int64
dtype: object
>>> df2_transposed.dtypes
0    object
1    object
dtype: object
property true_centroid

The center of gravity for a feature.

truediv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

axis{0 or ‘index’, 1 or ‘columns’}

Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.

levelint or label

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

DataFrame

Result of the arithmetic operation.

DataFrame.add : Add DataFrames. DataFrame.sub : Subtract DataFrames. DataFrame.mul : Multiply DataFrames. DataFrame.div : Divide DataFrames (float division). DataFrame.truediv : Divide DataFrames (float division). DataFrame.floordiv : Divide DataFrames (integer division). DataFrame.mod : Calculate modulo (remainder after division). DataFrame.pow : Calculate exponential power.

Mismatched indices will be unioned together.

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
truncate(before=None, after=None, axis=None, copy: bool = True)FrameOrSeries

Truncate a Series or DataFrame before and after some index value.

This is a useful shorthand for boolean indexing based on index values above or below certain thresholds.

beforedate, str, int

Truncate all rows before this index value.

afterdate, str, int

Truncate all rows after this index value.

axis{0 or ‘index’, 1 or ‘columns’}, optional

Axis to truncate. Truncates the index (rows) by default.

copybool, default is True,

Return a copy of the truncated section.

type of caller

The truncated Series or DataFrame.

DataFrame.loc : Select a subset of a DataFrame by label. DataFrame.iloc : Select a subset of a DataFrame by position.

If the index being truncated contains only datetime values, before and after may be specified as strings instead of Timestamps.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
...                    'B': ['f', 'g', 'h', 'i', 'j'],
...                    'C': ['k', 'l', 'm', 'n', 'o']},
...                   index=[1, 2, 3, 4, 5])
>>> df
   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o
>>> df.truncate(before=2, after=4)
   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

The columns of a DataFrame can be truncated.

>>> df.truncate(before="A", after="B", axis="columns")
   A  B
1  a  f
2  b  g
3  c  h
4  d  i
5  e  j

For Series, only rows can be truncated.

>>> df['A'].truncate(before=2, after=4)
2    b
3    c
4    d
Name: A, dtype: object

The index values in truncate can be datetimes or string dates.

>>> dates = pd.date_range('2016-01-01', '2016-02-01', freq='s')
>>> df = pd.DataFrame(index=dates, data={'A': 1})
>>> df.tail()
                     A
2016-01-31 23:59:56  1
2016-01-31 23:59:57  1
2016-01-31 23:59:58  1
2016-01-31 23:59:59  1
2016-02-01 00:00:00  1
>>> df.truncate(before=pd.Timestamp('2016-01-05'),
...             after=pd.Timestamp('2016-01-10')).tail()
                     A
2016-01-09 23:59:56  1
2016-01-09 23:59:57  1
2016-01-09 23:59:58  1
2016-01-09 23:59:59  1
2016-01-10 00:00:00  1

Because the index is a DatetimeIndex containing only dates, we can specify before and after as strings. They will be coerced to Timestamps before truncation.

>>> df.truncate('2016-01-05', '2016-01-10').tail()
                     A
2016-01-09 23:59:56  1
2016-01-09 23:59:57  1
2016-01-09 23:59:58  1
2016-01-09 23:59:59  1
2016-01-10 00:00:00  1

Note that truncate assumes a 0 value for any unspecified time component (midnight). This differs from partial string slicing, which returns any partially matching dates.

>>> df.loc['2016-01-05':'2016-01-10', :].tail()
                     A
2016-01-10 23:59:55  1
2016-01-10 23:59:56  1
2016-01-10 23:59:57  1
2016-01-10 23:59:58  1
2016-01-10 23:59:59  1
tshift(periods: int = 1, freq=None, axis: Union[str, int] = 0)FrameOrSeries

Shift the time index, using the index’s frequency if available.

Deprecated since version 1.1.0: Use shift instead.

periodsint

Number of periods to move, can be positive or negative.

freqDateOffset, timedelta, or str, default None

Increment to use from the tseries module or time rule expressed as a string (e.g. ‘EOM’).

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Corresponds to the axis that contains the Index.

shifted : Series/DataFrame

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

tz_convert(tz, axis=0, level=None, copy: bool = True)FrameOrSeries

Convert tz-aware axis to target time zone.

tz : str or tzinfo object axis : the axis to convert level : int, str, default None

If axis is a MultiIndex, convert a specific level. Otherwise must be None.

copybool, default True

Also make a copy of the underlying data.

{klass}

Object with time zone converted axis.

TypeError

If the axis is tz-naive.

tz_localize(tz, axis=0, level=None, copy: bool = True, ambiguous='raise', nonexistent: str = 'raise')FrameOrSeries

Localize tz-naive index of a Series or DataFrame to target time zone.

This operation localizes the Index. To localize the values in a timezone-naive Series, use Series.dt.tz_localize().

tz : str or tzinfo axis : the axis to localize level : int, str, default None

If axis ia a MultiIndex, localize a specific level. Otherwise must be None.

copybool, default True

Also make a copy of the underlying data.

ambiguous‘infer’, bool-ndarray, ‘NaT’, default ‘raise’

When clocks moved backward due to DST, ambiguous times may arise. For example in Central European Time (UTC+01), when going from 03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC and at 01:30:00 UTC. In such a situation, the ambiguous parameter dictates how ambiguous times should be handled.

  • ‘infer’ will attempt to infer fall dst-transition hours based on order

  • bool-ndarray where True signifies a DST time, False designates a non-DST time (note that this flag is only applicable for ambiguous times)

  • ‘NaT’ will return NaT where there are ambiguous times

  • ‘raise’ will raise an AmbiguousTimeError if there are ambiguous times.

nonexistentstr, default ‘raise’

A nonexistent time does not exist in a particular timezone where clocks moved forward due to DST. Valid values are:

  • ‘shift_forward’ will shift the nonexistent time forward to the closest existing time

  • ‘shift_backward’ will shift the nonexistent time backward to the closest existing time

  • ‘NaT’ will return NaT where there are nonexistent times

  • timedelta objects will shift nonexistent times by the timedelta

  • ‘raise’ will raise an NonExistentTimeError if there are nonexistent times.

New in version 0.24.0.

Series or DataFrame

Same type as the input.

TypeError

If the TimeSeries is tz-aware and tz is not None.

Localize local times:

>>> s = pd.Series([1],
...               index=pd.DatetimeIndex(['2018-09-15 01:30:00']))
>>> s.tz_localize('CET')
2018-09-15 01:30:00+02:00    1
dtype: int64

Be careful with DST changes. When there is sequential data, pandas can infer the DST time:

>>> s = pd.Series(range(7),
...               index=pd.DatetimeIndex(['2018-10-28 01:30:00',
...                                       '2018-10-28 02:00:00',
...                                       '2018-10-28 02:30:00',
...                                       '2018-10-28 02:00:00',
...                                       '2018-10-28 02:30:00',
...                                       '2018-10-28 03:00:00',
...                                       '2018-10-28 03:30:00']))
>>> s.tz_localize('CET', ambiguous='infer')
2018-10-28 01:30:00+02:00    0
2018-10-28 02:00:00+02:00    1
2018-10-28 02:30:00+02:00    2
2018-10-28 02:00:00+01:00    3
2018-10-28 02:30:00+01:00    4
2018-10-28 03:00:00+01:00    5
2018-10-28 03:30:00+01:00    6
dtype: int64

In some cases, inferring the DST is impossible. In such cases, you can pass an ndarray to the ambiguous parameter to set the DST explicitly

>>> s = pd.Series(range(3),
...               index=pd.DatetimeIndex(['2018-10-28 01:20:00',
...                                       '2018-10-28 02:36:00',
...                                       '2018-10-28 03:46:00']))
>>> s.tz_localize('CET', ambiguous=np.array([True, True, False]))
2018-10-28 01:20:00+02:00    0
2018-10-28 02:36:00+02:00    1
2018-10-28 03:46:00+01:00    2
dtype: int64

If the DST transition causes nonexistent times, you can shift these dates forward or backward with a timedelta object or ‘shift_forward’ or ‘shift_backward’.

>>> s = pd.Series(range(2),
...               index=pd.DatetimeIndex(['2015-03-29 02:30:00',
...                                       '2015-03-29 03:30:00']))
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
2015-03-29 03:00:00+02:00    0
2015-03-29 03:30:00+02:00    1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
2015-03-29 01:59:59.999999999+01:00    0
2015-03-29 03:30:00+02:00              1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))
2015-03-29 03:30:00+02:00    0
2015-03-29 03:30:00+02:00    1
dtype: int64
union(second_geometry)

Constructs the geometry that is the set-theoretic union of the input geometries.

Paramters:
second_geometry
  • a second geometry

unstack(level=- 1, fill_value=None)

Pivot a level of the (necessarily hierarchical) index labels.

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).

levelint, str, or list of these, default -1 (last level)

Level(s) of index to unstack, can pass level name.

fill_valueint, str or dict

Replace NaN with this value if the unstack produces missing values.

Series or DataFrame

DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation

from unstack).

>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1.0
     b   2.0
two  a   3.0
     b   4.0
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1.0  2.0
two  3.0  4.0
>>> s.unstack(level=0)
   one  two
a  1.0   3.0
b  2.0   4.0
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.0
     b  2.0
two  a  3.0
     b  4.0
dtype: float64
update(other, join='left', overwrite=True, filter_func=None, errors='ignore')None

Modify in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

otherDataFrame, or object coercible into a DataFrame

Should have at least one matching index/column label with the original DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name to align with the original DataFrame.

join{‘left’}, default ‘left’

Only left join is implemented, keeping the index and columns of the original object.

overwritebool, default True

How to handle non-NA values for overlapping keys:

  • True: overwrite original DataFrame’s values with values from other.

  • False: only update values that are NA in the original DataFrame.

filter_funccallable(1d-array) -> bool 1d-array, optional

Can choose to replace values other than NA. Return True for values that should be updated.

errors{‘raise’, ‘ignore’}, default ‘ignore’

If ‘raise’, will raise a ValueError if the DataFrame and other both contain non-NA data in the same place.

Changed in version 0.24.0: Changed from raise_conflict=False|True to errors=’ignore’|’raise’.

None : method directly changes calling object

ValueError
  • When errors=’raise’ and there’s overlapping non-NA data.

  • When errors is not either ‘ignore’ or ‘raise’

NotImplementedError
  • If join != ‘left’

dict.update : Similar method for dictionaries. DataFrame.merge : For column(s)-on-column(s) operations.

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
   A  B
0  1  4
1  2  5
2  3  6

The DataFrame’s length does not increase as a result of the update, only values at matching index/column labels are updated.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
   A  B
0  a  d
1  b  e
2  c  f

For Series, its name attribute must be set.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df
   A  B
0  a  d
1  b  y
2  c  e
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
>>> df.update(new_df)
>>> df
   A  B
0  a  x
1  b  d
2  c  e

If other contains NaNs the corresponding values are not updated in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
   A      B
0  1    4.0
1  2  500.0
2  3    6.0
value_counts(subset: Optional[Sequence[Optional[Hashable]]] = None, normalize: bool = False, sort: bool = True, ascending: bool = False)

Return a Series containing counts of unique rows in the DataFrame.

New in version 1.1.0.

subsetlist-like, optional

Columns to use when counting unique combinations.

normalizebool, default False

Return proportions rather than frequencies.

sortbool, default True

Sort by frequencies.

ascendingbool, default False

Sort in ascending order.

Series

Series.value_counts: Equivalent method on Series.

The returned Series will have a MultiIndex with one level per input column. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.

>>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
...                    'num_wings': [2, 0, 0, 0]},
...                   index=['falcon', 'dog', 'cat', 'ant'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0
cat            4          0
ant            6          0
>>> df.value_counts()
num_legs  num_wings
4         0            2
2         2            1
6         0            1
dtype: int64
>>> df.value_counts(sort=False)
num_legs  num_wings
2         2            1
4         0            2
6         0            1
dtype: int64
>>> df.value_counts(ascending=True)
num_legs  num_wings
2         2            1
6         0            1
4         0            2
dtype: int64
>>> df.value_counts(normalize=True)
num_legs  num_wings
4         0            0.50
2         2            0.25
6         0            0.25
dtype: float64
property values

Return a Numpy representation of the DataFrame.

Warning

We recommend using DataFrame.to_numpy() instead.

Only the values in the DataFrame will be returned, the axes labels will be removed.

numpy.ndarray

The values of the DataFrame.

DataFrame.to_numpy : Recommended alternative to this method. DataFrame.index : Retrieve the index labels. DataFrame.columns : Retrieving the column names.

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type() convention, mixing int64 and uint64 will result in a float64 dtype.

A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

>>> df = pd.DataFrame({'age':    [ 3,  29],
...                    'height': [94, 170],
...                    'weight': [31, 115]})
>>> df
   age  height  weight
0    3      94      31
1   29     170     115
>>> df.dtypes
age       int64
height    int64
weight    int64
dtype: object
>>> df.values
array([[  3,  94,  31],
       [ 29, 170, 115]])

A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).

>>> df2 = pd.DataFrame([('parrot',   24.0, 'second'),
...                     ('lion',     80.5, 1),
...                     ('monkey', np.nan, None)],
...                   columns=('name', 'max_speed', 'rank'))
>>> df2.dtypes
name          object
max_speed    float64
rank          object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
       ['lion', 80.5, 1],
       ['monkey', nan, None]], dtype=object)
var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

axis : {index (0), columns (1)} skipna : bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Series or DataFrame (if level specified)

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

Replace values where the condition is False.

condbool Series/DataFrame, array-like, or callable

Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

otherscalar, Series/DataFrame, or callable

Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

inplacebool, default False

Whether to perform the operation in place on the data.

axisint, default None

Alignment axis if needed.

levelint, default None

Alignment level if needed.

errorsstr, {‘raise’, ‘ignore’}, default ‘raise’

Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

  • ‘raise’ : allow exceptions to be raised.

  • ‘ignore’ : suppress exceptions. On error return original object.

try_castbool, default False

Try to cast the result back to the input type (if possible).

Same type as caller or None if inplace=True.

DataFrame.mask()Return an object of same shape as

self.

The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is True the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the where documentation in indexing.

>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
>>> s.mask(s > 0)
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
>>> s.where(s > 1, 10)
0    10
1    10
2    2
3    3
4    4
dtype: int64
>>> s.mask(s > 1, 10)
0     0
1     1
2    10
3    10
4    10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
>>> m = df % 3 == 0
>>> df.where(m, -df)
   A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9
>>> df.where(m, -df) == np.where(m, df, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
>>> df.where(m, -df) == df.mask(~m, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
within(second_geometry, relation=None)

Indicates if the base geometry is within the comparison geometry. Paramters:

second_geometry
  • a second geometry

relation
  • The spatial relationship type.

BOUNDARY - Relationship has no restrictions for interiors or boundaries. CLEMENTINI - Interiors of geometries must intersect. Specifying

CLEMENTINI is equivalent to specifying None. This is the default.

PROPER - Boundaries of geometries must not intersect.

xs(key, axis=0, level=None, drop_level: bool = True)

Return cross-section from the Series/DataFrame.

This method takes a key argument to select data at a particular level of a MultiIndex.

keylabel or tuple of label

Label contained in the index, or partially in a MultiIndex.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis to retrieve cross-section on.

levelobject, defaults to first n levels (n=1 or len(key))

In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.

drop_levelbool, default True

If False, returns object with same levels as self.

Series or DataFrame

Cross-section from the original Series or DataFrame corresponding to the selected index levels.

DataFrame.locAccess a group of rows and columns

by label(s) or a boolean array.

DataFrame.ilocPurely integer-location based indexing

for selection by position.

xs can not be used to set values.

MultiIndex Slicers is a generic way to get/set values on any level or levels. It is a superset of xs functionality, see MultiIndex Slicers.

>>> d = {'num_legs': [4, 4, 2, 2],
...      'num_wings': [0, 0, 2, 2],
...      'class': ['mammal', 'mammal', 'mammal', 'bird'],
...      'animal': ['cat', 'dog', 'bat', 'penguin'],
...      'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df
                           num_legs  num_wings
class  animal  locomotion
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2

Get values at specified index

>>> df.xs('mammal')
                   num_legs  num_wings
animal locomotion
cat    walks              4          0
dog    walks              4          0
bat    flies              2          2

Get values at several indexes

>>> df.xs(('mammal', 'dog'))
            num_legs  num_wings
locomotion
walks              4          0

Get values at specified index and level

>>> df.xs('cat', level=1)
                   num_legs  num_wings
class  locomotion
mammal walks              4          0

Get values at several indexes and levels

>>> df.xs(('bird', 'walks'),
...       level=[0, 'locomotion'])
         num_legs  num_wings
animal
penguin         2          2

Get values at specified column and axis

>>> df.xs('num_wings', axis=1)
class   animal   locomotion
mammal  cat      walks         0
        dog      walks         0
        bat      flies         2
bird    penguin  walks         2
Name: num_wings, dtype: int64