Monday, May 18, 2015

Enumerating ESRI Shapefile DBF Attributes

CHALLENGE -- Mystery, cold-boot ingest and analysis of ESRI shapefiles often requires analysis and reporting tools that fail to exist in the ArcGIS-ArcMAP GUI interface.  Or are "buried" under a really obscure workflow that is tricky to repeat -- over and over again -- with dozens of mouse clicks.  Lots of room for repeat failure.


SOLUTION -- Simple Python command line enumeration of ESRI Shapefile DBF attributes -- reporting column name, column length and data type -- the DBF "schema" -- inside a "naked" Win32 command line environment.  Suitable for Cut-n-Paste reporting and analysis.




APPROACH -- Python Script -- heavily modified from ESRI web site hints and kinks -- listed below.

# =====================================================================

#  Abstract:  Script to Enumerate ESRI Shapefile Attributes 
#             for Analysis and Reporting
#
#  Revised:   2010-11-18 -- New create from ESRI web hints 
#                           and examples

#  Version:   Update for ArcGIS 10.x and ArcPY 
#             "GeoProcessing" imports

#  Contact:   R.Marten
#             Q7 GeoComputing
#             24165 IH-10 West #217
#             SATX 78257
#             Email: Q7GeoComputing_at_WarpMail_dot_Net

#  From:
#   http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#\
#              //000v0000001p000000

# =====================================================================


# import arcgisscripting, os, sys   # -- ArcGIS 9.x imports
# gp = arcgisscripting.create(9.3)  # -- ArcGIS 9.x instantiation

import arcpy, sys, os   # -- ArcGIS 10.x imports

# For each field in the feature class (shapefile or GDB), 
#  print the field name, type, and length.
#
# ----- GDB data store Approach ----
# fieldList = arcpy.ListFields("C:/Data/Municipal.gdb/Hospitals")
#
# ----- Shapefile data store Approach ----
# fieldList = arcpy.ListFields("c:\gis\usgs\usgs_tx_24k_centroids_wgs84.shp")

# --- Check the number of command line arguments.
if len(sys.argv) == 1:
  print "Script requires a file argument -- feature class (Shp or GDB) "
  print "Try Local Dir: %s my_file.shp" % ( sys.argv[0] )
  print "Try Full Path: %s c:\gis\usgs\usgs_tx_24k_centroids_wgs84.shp " % ( sys.argv[0] )
else:
  shp_file = sys.argv[1]
  # fieldList = gp.ListFields( shp_file ) # --- version 9.x and earlier
  fieldList = arcpy.ListFields( shp_file )
  print "Field Name  Len  Name "
  print "==========  ===  ==============================="
  for field in fieldList:
    print("{0:10}  {1:3}  {2:8}"
      .format(field.name, field.length, field.type ))  

# === end script ===

USE & PROCEDURE STEPS -- Rough invocation steps are:

(1) Cut and paste the above script into a simple ASCII text file -- and save in a "safe" and easy to find folder location.  Suggest "c:\Downloads\ArcPY_Examples"

(2) On a WinXP or Win7 platform -- with ArcGIS 10.x installed -- create a "safe" working folder -- something like "c:\Downloads\ArcPY_Examples."  This maybe accomplished from the GUI / Windows "explorer" -- or via command line thus -- with typed commands in bold:


 c:\where_ever_cmd_starts> mkdir c:\Downloads
 c:\where_ever_cmd_starts> mkdir c:\Downloads\ArcPY_Examples


(3) Open a Win32 command window -- and type the following commands:


 c:\where_ever_cmd_starts> cd c:\Downloads\ArcPY_Examples
 c:\Downloads\ArcPY_Examples> 
     z_Enum_Shp_Fields_v10.py Parcels_20130730_nad27.shp


RESULTS #1 -- Screen capture of command line invocation of above script --- and targeting an example suite of tax parcels:


click for larger view

RESULTS #2 -- Screen capture of command line invocation and script results -- note the bracketed output:

click for larger view

CHALLENGES and POTENTIAL FRICTION POINTS --

(a) The 1st time this script is invoked -- especially on a freshly booted machine -- the ArcGIS Python modules and libraries will not be "memory resident" -- or fresh in the disk drive buffer queue -- and script startup may require 20 secs to 2 mins -- time depends upon many system factors.  After 1st invocation -- the script will execute promptly for any other shapefile -- as the necessary Python modules and libraries will be "handy" to the Operating System.

(b) ArcGIS License?  Got one?  If the ArcGIS installation is "boot-legged" -- the script may fail to start -- as the import of the ArcPY modules also performs a valid license check.

OTHER EXAMPLES -- The above is just one, simple, low-impact example using ArcPY at the "raw" command line -- and extract the DBF "database" schema.  Other examples developed in the past -- ideas to add to problem solving toolbox:

(a) Generating regular measurement and field sample grids via ArcGIS FISHNET -- useful for GPS and seismic field layouts and cross measurement checks.  Many oil & gas seismic layouts maybe generated -- or "re-created" and/or "repaired" from design parameters via ArcGIS FISHNET.

Typical Seismic Design 
Parameters

(b) Exporting shapefile "layers" based upon some categorical attribute data in a DBF table. -- many AutoCAD folks will struggle to understand shapefile categorization via DBF attributes -- and will ask for a shapefile "layer" that has "only the NO ENERGY" -- or "only the NO MULCH" -- or "only the NO FLY" -- or whatever.   When there are more than 3-5 attribute categories -- a script is a more SURE FIRE (deterministic) and REPEATABLE method of exporting each category value as a shapefile layer.

Categorical Map vs Legend "Layers"

(c) Generation of XY "shotgun scatter" pseudo-random points for low-bias statistical sampling and "attribute pulls."  Basically setting up a map -- with randomly scattered XY locations -- to "pickup" a third / fourth / fifth / etc attribute -- Z1 / Z2 / Z3 / etc -- and perform some summary stats for "hot spots" and "heat maps" and "peaks & pits" identificaton.

XY Scatter Search for Peaks & Pits

SUMMARY -- Sometimes there is no replacement for a deterministic, repeatable and document-able script to capture some key workflow in data mining.

No comments:

Post a Comment