Technical Reference

calculate

pandarus.calculate.raster_statistics(vector_fp, identifying_field, raster, output=None, band=1, compress=True, fiona_kwargs={}, **kwargs)

Create statistics by matching raster against each spatial unit in self.from_map.

For each spatial unit in self.from_map, calculates the following statistics for values from raster: min, mean, max, and count. Count is the number of raster cells intersecting the vector spatial unit. No data values in the raster are not including in the generated statistics.

This function uses a fork of the rasterstats library that break each raster cell into 100 smaller cells, as a compromise approach to handle the fact that some raster cells are completely with a vector geometry, while others only have a small fraction of their cell area within the vector geometry. Each of the 100 small raster cells is weighted equally, and each is tested to make sure it intersects the vector geometry.

This function assumes that each smaller raster cell has the same area. This may change in the future.

Input parameters:

  • vector_fp: str. Filepath of the vector dataset.
  • identifying_field: str. Name of the field in vector_fp that uniquely identifies each feature.
  • raster: str. Filepath of the raster dataset.
  • output: str, optional. Filepath of the output file. Will be deleted if it exists already.
  • band: int, optional. Raster band used for calculations. Default is 1.
  • compress: bool, optional. Compress JSON results file. Default is True.
  • fiona_kwargs: dict, optional. Additional arguments to pass to fiona when opening vector_fp.

Any additional kwargs are passed to gen_zonal_stats.

Output format:

Output is a (maybe compressed) JSON file with the following schema:

{
    'metadata': {
        'vector': {
            'field': 'name of uniquely identifying field',
            'path': 'path to vector input file',
            'sha256': 'sha256 hash of input file'
        },
        'raster': {
            'band': 'band used to calculate raster stats',
            'path': 'path to raster input file',
            'filename': 'name of raster file',
            'sha256': 'sha256 hash of input file'
        },
        'when': 'datetime this calculation finished, ISO format'
    },
    'data': [
        [
            'vector `identifying_field` value',
            {
                'count': 'number of raster cells included. float because consider fractional intersections',
                'min': 'minimum raster value in this vector feature',
                'mean': 'average raster value in this vector feature',
                'max': 'maximum raster value in this vector feature',
            }
        ]
    ]
}
pandarus.calculate.intersect(first_fp, first_field, second_fp, second_field, first_kwargs={}, second_kwargs={}, dirpath=None, cpus=4, driver='GeoJSON', compress=True, log_dir=None)

Calculate the intersection of two vector spatial datasets.

The first spatial input file must have only one type of geometry, i.e. points, lines, or polygons, and excluding geometry collections. Any of the following are allowed: Point, MultiPoint, LineString, LinearRing, MultiLineString, Polygon, MultiPolygon.

The second spatial input file must have either Polygons or MultiPolygons. Although no checks are made, this and other functions make a strong assumption that the spatial units in the second spatial unit do not overlap.

Input parameters:

  • first_fp: String. File path to the first spatial dataset.
  • first_field: String. Name of field that uniquely identifies features in the first spatial dataset.
  • second_fp: String. File path to the second spatial dataset.
  • second_field: String. Name of field that uniquely identifies features in the second spatial dataset.
  • first_kwargs: Dictionary, optional. Additional arguments, such as layer name, passed to fiona when opening the first spatial dataset.
  • second_kwargs: Dictionary, optional. Additional arguments, such as layer name, passed to fiona when opening the second spatial dataset.
  • dirpath: String, optional. Directory to save output files.
  • cpus: Integer, default is multiprocessing.cpu_count(). Number of CPU cores to use when calculating. Use cpus=0 to avoid starting a multiprocessing pool.
  • driver: String, default is GeoJSON. Fiona driver name to use when writing geospatial output file. Common values are GeoJSON or GPKG.
  • compress: Boolean, default is True. Compress JSON output file.
  • log_dir: String, optional.

Returns filepaths for two created files.

The first is a geospatial file that has the geometry of each possible intersection of spatial units from the two input files. The geometry type of this file will depend on the geometry type of the first input file, but will always be a multi geometry, i.e. one of MultiPoint, MultiLineString, MultiPolygon. This file will also always have the WGS 84 CRS. The output file has the following schema:

  • id: Integer. Auto-increment field starting from zero.
  • from_label: String. The value for the uniquely identifying field from the first input file.
  • to_label: String. The value for the uniquely identifying field from the second input file.
  • measure: Float. A measure of the intersected shape. For polygons, this is the area of the feature in square meters. For lines, this is the length in meters. For points, this is the number of points. Area and length calculations are made using the Mollweide projection.

The second file is an extract of some of the feature fields in the JSON data format. This is used by programs that don’t need to depend on GIS data libraries. The JSON format is:

{
    'metadata': {
        'first': {
            'field': 'name of uniquely identifying field',
            'path': 'path to first input file',
            'filename': 'name of first input file',
            'sha256': 'sha256 hash of input file'
        },
        'second': {
            'field': 'name of uniquely identifying field',
            'path': 'path to second input file',
            'filename': 'name of second input file',
            'sha256': 'sha256 hash of input file'
        },
        'when': 'datetime this calculation finished, ISO format'
    },
    'data': [
        [
            'identifying field for first file',
            'identifying field for second file',
            'measure value'
        ]
    ]
}
pandarus.calculate.intersections_from_intersection(fp, metadata=None, dirpath=None)

Process an intersections spatial dataset to create two intersections data files.

fp is the file path of a vector dataset created by the intersect function. The intersection of two spatial scales (A, B) is a third spatial scale (C); this function creates intersection data files for (A, C) and (B, C).

As the intersections data file includes metadata on the input files, this function must have access to the intersections data file created at the same time as intersections spatial dataset. If the metadata filepath is not provided, the metadata file is looked for in the same directory as fp.

Returns the file paths of the two new intersections data files.

pandarus.calculate.calculate_remaining(source_fp, source_field, intersection_fp, source_kwargs={}, dirpath=None, compress=True)

Calculate the remaining area/length/number of points left out of an intersections file generated by intersect.

Input parameters:

  • source_fp: String. Filepath of the input spatial data which could have features outside of the intersection result.
  • source_field: String. Name of field that uniquely identifies features in the input spatial dataset.
  • intersection_fp: Filepath of the intersection spatial dataset generated by the intersect function.
  • source_kwargs: Dictionary, optional. Additional arguments, such as layer name, passed to fiona when opening the input spatial dataset.
  • dirpath: String, optional. Directory where the output file will be saved.
  • compress: Boolean. Whether or not to compress the output file.

Warning

source_fp must be the first file provided to the intersect function, not the second!

Returns the filepath of the output file. The output file JSON format is:

{
    'metadata': {
        'source': {
            'field': 'name of uniquely identifying field',
            'path': 'path to the input file',
            'filename': 'name of the input file',
            'sha256': 'sha256 hash of the input file'
        },
        'intersections': {
            'field': 'name of uniquely identifying field (always `id`)',
            'path': 'path to intersections spatial dataset',
            'filename': 'name of intersections spatial dataset',
            'sha256': 'sha256 hash of intersection spatial dataset'
        }
        'when': 'datetime this calculation finished, ISO format'
    },
    'data': [
        [
            'identifying field for source file',
            'measure value'
        ]
    ]
}

conversion

pandarus.conversion.check_type(filepath)

Determine if a GIS dataset is raster or vector.

filepath is a filepath of a GIS dataset file.

Returns 'vector' or 'raster'. Raises a ValueError if the file can’t be opened with fiona or rasterio.

pandarus.conversion.convert_to_vector(filepath, dirpath=None, band=1)

Convert raster file at filepath to a vector file. Returns filepath of created vector file.

dirpath should be a writable directory. If dirpath is no specified, uses the appdirs library to find an appropriate directory.

band should be the integer index of the band; default is 1. Note that band indices start from 1, not 0.

The generated vector file will be in GeoJSON, and have the WGS84 CRS.

Because we are using GDAL polygonize, we can’t use 64 bit floats. This function will automatically convert rasters from 64 to 32 bit floats if necessary.

pandarus.conversion.clean_raster(fp, new_fp=None, band=1, nodata=None)
Clean raster data and metadata:
  • Delete invalid block sizes, and remove tiling
  • Set nodata to a reasonable value, if possible
  • Convert to 32 bit floats, if currently 64 bit floats and such conversion is possible

fp: String. Filepath of the input raster file.

new_fp: String, optional. Filepath of the raster to create. If not provided, the new raster will have the same name as the existing file, but will be created in a temporary directory.

band: Integer, default is 1. Raster band to clean and create in new file. Each band of a multiband raster would have to be cleaned separately.

nodata: Float, optional. Additional value to try when changing nodata value; must not be present in existing raster data.

Returns the filepath of the new file as a compressed GeoTIFF. Can also return None if no new raster was written due to failing preconditions.

pandarus.conversion.round_raster(in_fp, out_fp=None, band=1, sig_digits=3)

Round raster cell values to a certain number of significant digits in new raster file. For example, π rounded to 4 significant digits is 3.142.

  • in_fp: String. Filepath of raster input file.
  • out_fp: String, optional. Filepath of new raster to be created. Should not currently exist. If not provided, the new raster will have the same name as the existing file, but will be created in a temporary directory.
  • band: Int, default is 1. Band to round. Band indices start at 1.
  • sig_digits: Int, default is 3. Number of significant digits to round to.

The created raster file will have the same dtype, shape, and CRS as the input file. It will be a compressed GeoTIFF.

Returns out_fp, the filepath of the created file.

filesystem

pandarus.filesystem.get_appdirs_path(subdir)

Get path for an appdirs directory, with subdirectory subdir.

Returns the full directory path.

pandarus.filesystem.sha256(filepath, blocksize=65536)

Generate SHA 256 hash for file at filepath.

blocksize (default is 65536) is block size to feed to hasher.

Returns a str.

pandarus.filesystem.json_exporter(data, filepath, compress=True)

Export a file to JSON. Compressed with bz2 is compress.

Returns the filepath of the JSON file. Returned filepath is not necessarily filepath, if compress is True.

pandarus.filesystem.json_importer(fp)

Load a JSON file. Can be compressed with bz2 - if so, it should have the extension .bz2.

Returns the data in the JSON file.

geometry

pandarus.geometry.clean(geom)

Clean invalid geometries using buffer(0) trick.

geom is a shapely geometry; returns a shapely geometry.

pandarus.geometry.recursive_geom_finder(geom, kind)

Return all elements of geom that are of kind. For example, return all linestrings in a geometry collection.

geom is a Shapely geometry.

kind should be one of ("line", "point", "polygon").

Returns either a MultiPoint, MultiLineString, or MultiPolygon. Returns None is no valid element is found.

pandarus.geometry.get_intersection(obj, kind, collection, indices, to_meters=True, return_geoms=True)

Return a dictionary describing the intersection of obj with collection[indices].

obj is a Shapely geometry. kind is one of ("line", "point", "polygon") - the kind of object to be returned. collection is a Map. indices is an iterator of integers; indices into collection. projection_func is a function to project the results to a new CRS before taking area, etc. If falsey, no projection will take place. return_geoms: Return intersected geometries in addition to area, etc.

Assumes that the polygons in collection do not overlap.

Returns a dictionary of form:

{
    collection_index: {
        'measure': measure of are or length,
        'geom': intersected geometry # if return_geoms
    }
}

The algorithm used for line and point intersections is incorrect - it will double count lines which lay along the borders of two polygons, and point that lie on the border of two polygons. A more robust function would take substantially more development and computation time, and total error should be less than 10 percent.

pandarus.geometry.get_measure(geom, kind=None)

Get area, length, or number of points in geom.

  • geom: A shapely geom.
  • kind: Geometry type, optional. One of polygon, line, or point.

Kind will be guessed based on type of geom if not otherwise provided.

If kind is not one of the allowed types, raises ValueError.

Returns a float.

pandarus.geometry.get_remaining(original, geoms, to_meters=True)

Get the remaining area/length/number from original after subtracting the union of geoms.

  • original: Shapely geom in WGS84 CRS.
  • geoms: List of shapely geoms in WGS84 CRS.
  • to_meters: Boolean. Return value calculated in Mollweide projection.

original and geoms should have the same geometry type, and geoms are components of original.

Returns a float.

map

class pandarus.maps.Map(filepath, identifying_field=None, **kwargs)

A wrapper around fiona open that provides some additional functionality.

Requires an absolute filepath.

Additional metadata can be provided in kwargs:
  • layer specifies the shapefile layer

Warning

The Fiona field id is not used, as there are no real constraints on these values or values types (see Fiona manual), and real world data is often dirty and inconsistent. Instead, we use enumerate and integer indices.

__init__(filepath, identifying_field=None, **kwargs)
create_rtree_index()

Create rtree index for efficient spatial querying.

Note: Bounds are given in lat/long, not in the native CRS

crs

Coordinate reference system, as defined by vector file.

iter_latlong(indices=None)

Iterate over dataset as Shapely geometries in WGS 84 CRS.

intersections

pandarus.intersections.intersection_dispatcher(from_map, to_map, from_objs=None, cpus=None, log_dir=None)
pandarus.intersections.intersection_worker(from_map, from_objs, to_map, worker_id=1)

Multiprocessing worker for map matching

projection

projection.project(geom, from_proj=None, to_proj=None)

Project a shapely geometry, and returns a new geometry of the same type from the transformed coordinates.

Default input projection is WGS84, default output projection is Mollweide.

Inputs:
geom: A shapely geometry. from_proj: A PROJ4 string. Optional. to_proj: A PROJ4 string. Optional.
Returns:
A shapely geometry.
projection.wgs84(s)

Fix no CRS or fiona giving abbreviated wgs84 definition.

Returns WGS84 if s is falsey.