Technical Reference¶
calculate¶
-
pandarus.calculate.
raster_statistics
(vector_fp, identifying_field, raster, output=None, band=1, compress=True, fiona_kwargs={}, **kwargs)¶ Create statistics by matching
raster
against each spatial unit inself.from_map
.For each spatial unit in
self.from_map
, calculates the following statistics for values fromraster
: min, mean, max, and count. Count is the number of raster cells intersecting the vector spatial unit. No data values in the raster are not including in the generated statistics.This function uses a fork of the
rasterstats
library that break each raster cell into 100 smaller cells, as a compromise approach to handle the fact that some raster cells are completely with a vector geometry, while others only have a small fraction of their cell area within the vector geometry. Each of the 100 small raster cells is weighted equally, and each is tested to make sure it intersects the vector geometry.This function assumes that each smaller raster cell has the same area. This may change in the future.
Input parameters:
vector_fp
: str. Filepath of the vector dataset.identifying_field
: str. Name of the field invector_fp
that uniquely identifies each feature.raster
: str. Filepath of the raster dataset.output
: str, optional. Filepath of the output file. Will be deleted if it exists already.band
: int, optional. Raster band used for calculations. Default is1
.compress
: bool, optional. Compress JSON results file. Default isTrue
.fiona_kwargs
: dict, optional. Additional arguments to pass to fiona when openingvector_fp
.
Any additional
kwargs
are passed togen_zonal_stats
.Output format:
Output is a (maybe compressed) JSON file with the following schema:
{ 'metadata': { 'vector': { 'field': 'name of uniquely identifying field', 'path': 'path to vector input file', 'sha256': 'sha256 hash of input file' }, 'raster': { 'band': 'band used to calculate raster stats', 'path': 'path to raster input file', 'filename': 'name of raster file', 'sha256': 'sha256 hash of input file' }, 'when': 'datetime this calculation finished, ISO format' }, 'data': [ [ 'vector `identifying_field` value', { 'count': 'number of raster cells included. float because consider fractional intersections', 'min': 'minimum raster value in this vector feature', 'mean': 'average raster value in this vector feature', 'max': 'maximum raster value in this vector feature', } ] ] }
-
pandarus.calculate.
intersect
(first_fp, first_field, second_fp, second_field, first_kwargs={}, second_kwargs={}, dirpath=None, cpus=4, driver='GeoJSON', compress=True, log_dir=None)¶ Calculate the intersection of two vector spatial datasets.
The first spatial input file must have only one type of geometry, i.e. points, lines, or polygons, and excluding geometry collections. Any of the following are allowed: Point, MultiPoint, LineString, LinearRing, MultiLineString, Polygon, MultiPolygon.
The second spatial input file must have either Polygons or MultiPolygons. Although no checks are made, this and other functions make a strong assumption that the spatial units in the second spatial unit do not overlap.
Input parameters:
first_fp
: String. File path to the first spatial dataset.first_field
: String. Name of field that uniquely identifies features in the first spatial dataset.second_fp
: String. File path to the second spatial dataset.second_field
: String. Name of field that uniquely identifies features in the second spatial dataset.first_kwargs
: Dictionary, optional. Additional arguments, such as layer name, passed to fiona when opening the first spatial dataset.second_kwargs
: Dictionary, optional. Additional arguments, such as layer name, passed to fiona when opening the second spatial dataset.dirpath
: String, optional. Directory to save output files.cpus
: Integer, default ismultiprocessing.cpu_count()
. Number of CPU cores to use when calculating. Usecpus=0
to avoid starting a multiprocessing pool.driver
: String, default isGeoJSON
. Fiona driver name to use when writing geospatial output file. Common values areGeoJSON
orGPKG
.compress
: Boolean, default is True. Compress JSON output file.log_dir
: String, optional.
Returns filepaths for two created files.
The first is a geospatial file that has the geometry of each possible intersection of spatial units from the two input files. The geometry type of this file will depend on the geometry type of the first input file, but will always be a multi geometry, i.e. one of MultiPoint, MultiLineString, MultiPolygon. This file will also always have the WGS 84 CRS. The output file has the following schema:
id
: Integer. Auto-increment field starting from zero.from_label
: String. The value for the uniquely identifying field from the first input file.to_label
: String. The value for the uniquely identifying field from the second input file.measure
: Float. A measure of the intersected shape. For polygons, this is the area of the feature in square meters. For lines, this is the length in meters. For points, this is the number of points. Area and length calculations are made using the Mollweide projection.
The second file is an extract of some of the feature fields in the JSON data format. This is used by programs that don’t need to depend on GIS data libraries. The JSON format is:
{ 'metadata': { 'first': { 'field': 'name of uniquely identifying field', 'path': 'path to first input file', 'filename': 'name of first input file', 'sha256': 'sha256 hash of input file' }, 'second': { 'field': 'name of uniquely identifying field', 'path': 'path to second input file', 'filename': 'name of second input file', 'sha256': 'sha256 hash of input file' }, 'when': 'datetime this calculation finished, ISO format' }, 'data': [ [ 'identifying field for first file', 'identifying field for second file', 'measure value' ] ] }
-
pandarus.calculate.
intersections_from_intersection
(fp, metadata=None, dirpath=None)¶ Process an intersections spatial dataset to create two intersections data files.
fp
is the file path of a vector dataset created by theintersect
function. The intersection of two spatial scales (A, B) is a third spatial scale (C); this function creates intersection data files for (A, C) and (B, C).As the intersections data file includes metadata on the input files, this function must have access to the intersections data file created at the same time as intersections spatial dataset. If the
metadata
filepath is not provided, the metadata file is looked for in the same directory asfp
.Returns the file paths of the two new intersections data files.
-
pandarus.calculate.
calculate_remaining
(source_fp, source_field, intersection_fp, source_kwargs={}, dirpath=None, compress=True)¶ Calculate the remaining area/length/number of points left out of an intersections file generated by
intersect
.Input parameters:
source_fp
: String. Filepath of the input spatial data which could have features outside of the intersection result.source_field
: String. Name of field that uniquely identifies features in the input spatial dataset.intersection_fp
: Filepath of the intersection spatial dataset generated by theintersect
function.source_kwargs
: Dictionary, optional. Additional arguments, such as layer name, passed to fiona when opening the input spatial dataset.dirpath
: String, optional. Directory where the output file will be saved.compress
: Boolean. Whether or not to compress the output file.
Warning
source_fp
must be the first file provided to theintersect
function, not the second!Returns the filepath of the output file. The output file JSON format is:
{ 'metadata': { 'source': { 'field': 'name of uniquely identifying field', 'path': 'path to the input file', 'filename': 'name of the input file', 'sha256': 'sha256 hash of the input file' }, 'intersections': { 'field': 'name of uniquely identifying field (always `id`)', 'path': 'path to intersections spatial dataset', 'filename': 'name of intersections spatial dataset', 'sha256': 'sha256 hash of intersection spatial dataset' } 'when': 'datetime this calculation finished, ISO format' }, 'data': [ [ 'identifying field for source file', 'measure value' ] ] }
conversion¶
-
pandarus.conversion.
check_type
(filepath)¶ Determine if a GIS dataset is raster or vector.
filepath
is a filepath of a GIS dataset file.Returns
'vector'
or'raster'
. Raises aValueError
if the file can’t be opened with fiona or rasterio.
-
pandarus.conversion.
convert_to_vector
(filepath, dirpath=None, band=1)¶ Convert raster file at
filepath
to a vector file. Returns filepath of created vector file.dirpath
should be a writable directory. Ifdirpath
is no specified, uses the appdirs library to find an appropriate directory.band
should be the integer index of the band; default is 1. Note that band indices start from 1, not 0.The generated vector file will be in GeoJSON, and have the WGS84 CRS.
Because we are using GDAL polygonize, we can’t use 64 bit floats. This function will automatically convert rasters from 64 to 32 bit floats if necessary.
-
pandarus.conversion.
clean_raster
(fp, new_fp=None, band=1, nodata=None)¶ - Clean raster data and metadata:
- Delete invalid block sizes, and remove tiling
- Set nodata to a reasonable value, if possible
- Convert to 32 bit floats, if currently 64 bit floats and such conversion is possible
fp
: String. Filepath of the input raster file.new_fp
: String, optional. Filepath of the raster to create. If not provided, the new raster will have the same name as the existing file, but will be created in a temporary directory.band
: Integer, default is1
. Raster band to clean and create in new file. Each band of a multiband raster would have to be cleaned separately.nodata
: Float, optional. Additional value to try when changingnodata
value; must not be present in existing raster data.Returns the filepath of the new file as a compressed GeoTIFF. Can also return
None
if no new raster was written due to failing preconditions.
-
pandarus.conversion.
round_raster
(in_fp, out_fp=None, band=1, sig_digits=3)¶ Round raster cell values to a certain number of significant digits in new raster file. For example, π rounded to 4 significant digits is 3.142.
in_fp
: String. Filepath of raster input file.out_fp
: String, optional. Filepath of new raster to be created. Should not currently exist. If not provided, the new raster will have the same name as the existing file, but will be created in a temporary directory.band
: Int, default is 1. Band to round. Band indices start at 1.sig_digits
: Int, default is 3. Number of significant digits to round to.
The created raster file will have the same
dtype
, shape, and CRS as the input file. It will be a compressed GeoTIFF.Returns
out_fp
, the filepath of the created file.
filesystem¶
-
pandarus.filesystem.
get_appdirs_path
(subdir)¶ Get path for an
appdirs
directory, with subdirectorysubdir
.Returns the full directory path.
-
pandarus.filesystem.
sha256
(filepath, blocksize=65536)¶ Generate SHA 256 hash for file at
filepath
.blocksize
(default is 65536) is block size to feed to hasher.Returns a
str
.
-
pandarus.filesystem.
json_exporter
(data, filepath, compress=True)¶ Export a file to JSON. Compressed with
bz2
iscompress
.Returns the filepath of the JSON file. Returned filepath is not necessarily
filepath
, ifcompress
isTrue
.
-
pandarus.filesystem.
json_importer
(fp)¶ Load a JSON file. Can be compressed with
bz2
- if so, it should have the extension.bz2
.Returns the data in the JSON file.
geometry¶
-
pandarus.geometry.
clean
(geom)¶ Clean invalid geometries using
buffer(0)
trick.geom
is a shapely geometry; returns a shapely geometry.
-
pandarus.geometry.
recursive_geom_finder
(geom, kind)¶ Return all elements of
geom
that are ofkind
. For example, return all linestrings in a geometry collection.geom
is a Shapely geometry.kind
should be one of("line", "point", "polygon")
.Returns either a
MultiPoint
,MultiLineString
, orMultiPolygon
. ReturnsNone
is no valid element is found.
-
pandarus.geometry.
get_intersection
(obj, kind, collection, indices, to_meters=True, return_geoms=True)¶ Return a dictionary describing the intersection of
obj
withcollection[indices]
.obj
is a Shapely geometry.kind
is one of("line", "point", "polygon")
- the kind of object to be returned.collection
is aMap
.indices
is an iterator of integers; indices intocollection
.projection_func
is a function to project the results to a new CRS before taking area, etc. If falsey, no projection will take place.return_geoms
: Return intersected geometries in addition to area, etc.Assumes that the polygons in
collection
do not overlap.Returns a dictionary of form:
{ collection_index: { 'measure': measure of are or length, 'geom': intersected geometry # if return_geoms } }
The algorithm used for line and point intersections is incorrect - it will double count lines which lay along the borders of two polygons, and point that lie on the border of two polygons. A more robust function would take substantially more development and computation time, and total error should be less than 10 percent.
-
pandarus.geometry.
get_measure
(geom, kind=None)¶ Get area, length, or number of points in
geom
.geom
: A shapely geom.kind
: Geometry type, optional. One of polygon, line, or point.
Kind will be guessed based on type of
geom
if not otherwise provided.If
kind
is not one of the allowed types, raisesValueError
.Returns a float.
-
pandarus.geometry.
get_remaining
(original, geoms, to_meters=True)¶ Get the remaining area/length/number from
original
after subtracting the union ofgeoms
.original
: Shapely geom in WGS84 CRS.geoms
: List of shapely geoms in WGS84 CRS.to_meters
: Boolean. Return value calculated in Mollweide projection.
original
andgeoms
should have the same geometry type, andgeoms
are components oforiginal
.Returns a float.
map¶
-
class
pandarus.maps.
Map
(filepath, identifying_field=None, **kwargs)¶ A wrapper around fiona
open
that provides some additional functionality.Requires an absolute filepath.
- Additional metadata can be provided in kwargs:
- layer specifies the shapefile layer
Warning
The Fiona field
id
is not used, as there are no real constraints on these values or values types (see Fiona manual), and real world data is often dirty and inconsistent. Instead, we useenumerate
and integer indices.-
__init__
(filepath, identifying_field=None, **kwargs)¶
-
create_rtree_index
()¶ Create rtree index for efficient spatial querying.
Note: Bounds are given in lat/long, not in the native CRS
-
crs
¶ Coordinate reference system, as defined by vector file.
-
iter_latlong
(indices=None)¶ Iterate over dataset as Shapely geometries in WGS 84 CRS.
intersections¶
-
pandarus.intersections.
intersection_dispatcher
(from_map, to_map, from_objs=None, cpus=None, log_dir=None)¶
-
pandarus.intersections.
intersection_worker
(from_map, from_objs, to_map, worker_id=1)¶ Multiprocessing worker for map matching
projection¶
-
projection.
project
(geom, from_proj=None, to_proj=None)¶ Project a
shapely
geometry, and returns a new geometry of the same type from the transformed coordinates.Default input projection is WGS84, default output projection is Mollweide.
- Inputs:
- geom: A
shapely
geometry. from_proj: APROJ4
string. Optional. to_proj: APROJ4
string. Optional. - Returns:
- A
shapely
geometry.
-
projection.
wgs84
(s)¶ Fix no CRS or fiona giving abbreviated wgs84 definition.
Returns WGS84 if
s
is falsey.