For many big datasets, location is a crucial component to truly understand underlying patterns and trends. Without location, datasets are less valuable, or in extreme circumstances - meaningless. GIS Tools for Hadoop works with big spatial data (big data with location) and allows you to complete spatial analysis using the power of distributed processing in Hadoop.
The GIS Tools for Hadoop toolkit allows you to leverage the Hadoop framework to complete spatial analysis on spatial data; for example:
- Run a filter and aggregate operations on billions of spatial data records based on location.
- Define new areas represented as polygons, and run a point in polygon analysis on billions of spatial data records inside Hadoop.
- Visualize analysis results on a map and apply informative symbology.
- Integrate your maps in reports, or publish them as map applications online.
An Overview of the Toolkit
GIS Tools for Hadoop is an open source toolkit that brings spatial analysis to your big data. The toolkit is composed of four projects:
- The Building Blocks
- The Framework
- The Connector
- The Toolkit
Esri Geometry API for Java: This library includes geometry objects (e.g. points, lines, and polygons), spatial operations (e.g. intersects, buffer), and spatial indexing. By deploying the Esri geometry API library (as a jar) within Hadoop, you are able to build custom MapReduce applications using Java to complete analysis on your spatial data. This can be used as a standalone library, or combined with the following projects [2-4] to create a SQL like workflow.
Spatial Framework for Hadoop: This library includes user defined functions (UDFs) that extend Hive and are built upon capabilities of the Esri Geometry API. By enabling this library in Hive, you are able to construct queries using Hive Query Language (HQL), which is very similar to SQL. This allows you to avoid complicated MapReduce algorithms and stick to a more familiar workflow.
Geoprocessing Tools for Hadoop: These tools are downloaded as a toolbox and applied in ArcMap – recreating a typical workflow for an ArcGIS user. Using these tools, you can connect data between Hadoop and ArcGIS, submit workflow jobs, and convert data to and from JSON. You can then transport your Hadoop results into ArcGIS for visualization. If you have created a subset or more manageable dataset, you are then able to complete analysis in ArcGIS Desktop and have the advantage of using more than the 1000+ tools available. Additionally, you can take full advantage of the ArcGIS platform to publish your maps to Server or Online, create web and mobile apps, and integrate them with BI reports and more.
GIS Tools for Hadoop: This project synthesizes the above three projects into the toolkit. It includes samples and instructions that leverage the complete toolkit. The samples are available to help test your deployment of the spatial libraries with Hadoop and Hive and to ensure everything runs without issue before implementing your own solutions.
If you have any questions please use our Big Data GeoNet page.
If you find a bug or have a request, please submit an issue on the respective project issue page.
VideosBig Data and Analytics with ArcGIS – Esri UC 2014
Big Data: Using ArcGIS with Apache Hadoop – Dev Summit 2014
Big Data: Using ArcGIS with Apache Hadoop – Dev Summit 2013
ArcGIS Platform: Big Data and Big Analysis – Dev Summit 2013
Big Data in ArcGIS – Fed UC 2013
New Spatial Aggregation Tutorial for GIS Tools for Hadoop
Setting up a Small Budget Hadoop Cluster for Big Data Analysis
Big Data ST_Geometry Queries up to 20X Faster in Hive
ST_Geometry Aggregate Functions for Hive in Spatial Framework for Hadoop
Vehicle Trip Discovery with GIS Tools for Hadoop
GIS Tools for Hadoop