Tuesday, October 7, 2014

The Role of LiDAR Attributes in Feature Extraction

Over the past few weeks I have noticed a number of questions in online discussion forums around the topic of how LiDAR point cloud attributes, such as classification and return number, can be used to help identify or automatically extract features.  We have numerous other posts detailing our automated feature extraction workflow, specifically how we use an object-based approach to extract information from LiDAR, imagery, and other data sources.  In this post I would like to turn the focus to LiDAR, specifically how the the point cloud attributes can be used to highlight above ground features such as buildings and tree canopy.

Most of the LiDAR data that we work with is acquired using the USGS specification, with an average of 1-4 points per square meter.  As LiDAR datasets are typically acquired to support topographic mapping of the earth's surface they are done during leaf-off conditions.  As a LiDAR signal will be reflected by leaves, this increases the chance that  the laser signal will reach the ground.

As long as your LiDAR data is in LAS format each point contains a wealth of information beyond the elevation.  The LiDAR point attributes we will be most concerned with in this post are the class and the number of returns.  You can find out more about both of these attributes by reading up on the ASPRS LAS specification.  The class is assigned to each point, typically by the contractor who processed the data, using a semi-automated approach.  The most basic LAS classification will split the points into ground (class 2) and unclassified (everything else) points (either 0 or 1).  The graphics below show an example of LiDAR data in LAS format first symbolized by elevation and then symbolized by classification.
LiDAR point cloud.  Each point is colored by its absolute elevation.  Blue represents the low elevations and red the highest elevations.
LiDAR point cloud symbolized by class.  Green is ground, magenta is overlap, cyan is water, and red is unclassified.  Black areas are water that contain no LiDAR points as water absorbs the LiDAR signal.
The return information comes from the LiDAR sensor.  Discrete return LiDAR data typically have up to four returns.  The graphic below shows the same point cloud in which the points have been symbolized based on the number of returns.  Dense surfaces, such as buildings and ground, have a single return (red), but trees generally produce multiple returns (green, cyan, and blue).  The less dense structure of trees (particular deciduous trees that lack leaves) creates a return at the top of the tree, then other returns of off subsquent branches, and finally the ground.

LiDAR point cloud symbolized by return number.  Red indicates a single return, green - two returns, cyan - three returns, and blue - four returns.
These representations illustrate how point cloud information can provide insight into the type of feature.  For example, trees and buildings are both tall and assigned to a class other than ground or water.  When it comes to the number of returns we see that buildings have a single return whereas trees typically have more than one return.  The process of using a combination of class and return number to differentiate between trees and buildings becomes more clear when we generate raster surface models from the LiDAR point cloud.  A Normalized Digital Surface Model (nDSM) is a gridded dataset in which each pixel represent the height of features relative to the ground.  It is created by using the ground points (LAS class 2) to create a raster Digital Elevation Model (DEM), using the first returns to create a raster Digital Surface Model (DSM), then subtracting the DEM from the DSM.  The example below shows the nDSM for the same area as the point cloud examples from above.  Buildings and trees show up as tall (red and yellow), whereas non-tall features on the landscape such as roads and grass show up as short (blue).

Normalized Digital Surface Model (nDSM).
A similar approach is used to create a Normalized Digital Terrain Model (nDTM).  A DTM is generated from the last returns. The DEM is then subtracted from the DTM to create the nDTM.  The nDTM is very effective at highlighting buildings and suppressing trees.  This is because the height of the last returns for buildings (dense surfaces) is much greater than the ground as the LiDAR signal does not penetrate buildings.  As the LiDAR signal penetrates tree canopy in most cases the height difference between the DTM and DEM is often low.

Normalized Digital Terrain Model (nDTM).
Subtracting the nDTM from the nDSM highlights trees.  This is because the height difference of the first and last returns for buildings is often identical, whereas for trees it is typically much greater.
nDTM subtracted from the nDSM.
Although these LiDAR datasets are excellent sources by themselves for mapping features they are imperfect for tree canopy extraction due to the leaf-off nature.  To overcome this limitation we take an object-based approach in which we integrate the spectral information in imagery and use iterative expert systems that take into account context to reconstruct the tree canopy, filling in the gaps in the leaf-off LiDAR.  The result is a highly-accurate and realistic representation of tree canopy.  In general LiDAR gets us 80%-90% of the way there, then imagery the rest of the way.
Tree canopy extracted using an object-based approach overlaid on a hillshade layer derived from the nDSM.
Leaf-on imagery.
For more information on how to create the surface models mentioned in this post check out the Quick Terrain Modeler video tutorials.  If you want to generate raster surface models in ArcGIS this video will show you how.

Saturday, August 23, 2014

New Urban Tree Canopy (UTC) Assessment Project Map Portal

We have a new Urban Tree Canopy (UTC) Assessment Projects web mapping portal up.  The web site lists all the UTC projects completed by the USDA Forest Service's UTC assessment team (hopefully down the road we can add others), key information about each project, along with the ability to download the project report and high-resolution land cover data.  Credit for the web map goes to the brilliant Matt Bansak with database support from the SAL's Tayler Engel.

Monday, August 18, 2014

Generating road polygons from breaklines and centerlines

A number of years ago LiDAR was acquired for the entire state of Pennsylvania through the PAMAP program.  The LiDAR data are currently available from PASDA, and are a great resource.  In addition to point cloud and raster surface models, the deliverables also included breaklines.  Breaklines are great, but they are just lines representing the edge of the roads.  What if you want to calculate the actual road surface?  Using the road breaklines in combination with existing county road centerline data we developed an automated routine within eCognition to turn the breakline and centerline data into road polygons so that the actual road paved area can be computed.  This is another example of how the term "Object-Based Image Analysis" or "OBIA" no longer fits the type of work that we are doing with eCognition.

Here is how we went about it.
1) Turn the breaklines and centerlines into image objects.

2) Compute the Euclidian distance to the road centerlines.

3) Classify objects based on their relative border to the centerlines and breaklines, and the distance to centerlines.

4) Clean up the classification based on the spatial arrangement of the road polygons.

5) Vectorize the road objects and simplify the borders (yellow lines are the vector polygon edges, pink polygons are the original image objects).


Ecology of Prestige, exploring the evidence in NYC

We have new paper in Environmental Management that uses the SAL’s signature high resolution, 7-class, LiDAR-derived land cover data (3ft version available here for free). The current study replicates and extends a previous paper that uses the SAL’s land cover data in Baltimore. Between our work on mapping, assessing, estimating the carbon abatement potential, and the tree canopy affects on asthma and air quality, this dataset is getting quite a bit of use. We hope that because the data is freely available others will continue to use these data.

After using some fancy spatial statistics we are able to show that even after controlling for population density (available space for trees), socioeconomic status (available resources for trees), there is still quite a bit of variation – much of which is explained by lifestyle characteristics.

We conclude “To conserve and enhance tree canopy cover on private residential lands, municipal agencies, non-profit organizations, and private businesses may need to craft different approaches to residents in different market segments instead of a “one-size-fits-all” approach.  Different urban forestry practices may be more appealing to members of different market segments, and policy makers can use that knowledge to their advantage.  In this case, advocates may consider policies and plans that address differences among residential markets and their motivations, preferences, and capacities to conserve existing trees and/or plant new trees.  Targeting a more locally appealing message about the values of trees with a more appropriate messenger tailored to different lifestyle segments may improve program effectiveness for tree giveaways.  Ultimately, this coupling of theory and action may be essential to providing a critical basis for achieving urban sustainability associated with land management.


The paper was co-written with Northern Research Station scientist, J. Morgan Grove and Clark University doctoral student Dexter Locke

Monday, July 21, 2014

Comparing tree canopy percentages

NeighborhoodNewspapers.com is citing a study in which the tree canopy in Sandy Springs increased by 3% since from 2010.  This is a big jump in a 3-year period.  According to the Wikipedia entry Sandy Springs has 24,142 acres of land area.  The report from the study states that the tree canopy in 2010 was 59% and the tree canopy in 2013 was 62%.  Using these numbers we would conclude that Sandy Springs had 14,244 acres of tree canopy in 2010 and 14,948 acres of tree canopy in 2013, an increase of approximately 724 acres.  This equates to over 547 football fields of new tree canopy!  That is a lot of new growth in only three years  Furthermore, these calculations assume that there was no loss.  A quick look at some of the imagery from the 2010-2013 time period (thanks Google!) shows that there was tree loss in the area due to development.  In order to make gains of 3% there had to certainly be an increase of more than 724 acres sq km of new tree canopy over this time period.
Imagery from 2010.  Trees in the red circle were removed over the 3-year period.
Imagery from 2012 in which tree loss is clearly visible.
Unfortunately no good numbers exist for how much tree new tree canopy can be produced from an urban forest on an annual basis, but it is highly unlikely that there was enough new tree canopy to offset the loss in this 3-year period. After have completed dozens of tree canopy studies we are of the opinion that mapping tree canopy from imagery alone gets one to within ~2% of the actual area of tree canopy.  Registration errors between the imagery form each time period along with mapping errors contribute to the accuracy.  If the 2010 and 2013 estimates were +/- 2% accurate then a statement of 3% gain simply cannot be supported by the data.

Based on my quick assessment of the imagery I am inclined to believe that it is more likely that Sandy Springs lost tree canopy over the 2010-2013 time period.  In the short term it is far easier to document tree canopy loss than gain as trees are easy to remove but hard to grow.  The fact is that based on the available data we don't know if Sandy Springs lost or gained tree canopy over these 3 years, and it is highly unlikely that the 724 new acres of tree canopy appeared in such a short time.  Although we have published methods that provide increased precision for mapping tree canopy change, it is very difficult to accurately quantify changes in tree canopy over a short time period, such as was done in this study, without LiDAR.  My recommendation is that Sandy Springs wait 5 to 10 years between future tree canopy assessments and use the iTree protocols to analyze historical changes.

Monday, July 7, 2014

Data Fusion in eCognition Webinar


Today I gave a webinar showing some of the data fusion techniques we employ within eCognition to update features in vector layers.  eCognition often is seen as an OBIA (object-based image analysis) platform, but more recently it has morphed into a data fusion platform in which one can work with LiDAR point clouds, raster imagery, and vector points, lines, and polygons within a single environment.  The object data model breaks down the barriers that exist in working with these differing data types and the rule-based approach within eCognition allows for streamlined workflows in which everything from point cloud rasterization to vector overlay operations can be done within a single software package.

Although I am still working on the webinar recording I have posted the eCognition project, rule set, and associated data for download.  This will allow you to walk step-by-step through the example.  Please note that this project was designed to illustrate some techniques, not be a perfect feature extraction work flow.  Also note that although the LiDAR and imagery are for the Wicomico County, Maryland area, the vector data are fictional.

The data for this project consist of the following:

  • a LiDAR point cloud (LAS)
  • 4-band high-resolution imagery (raster)
  • Building polygons (vector shapefile)
  • Water polygons (vector shapefile)
  • Road centerlines (vector shapefile)
The scenario is one in which the water polygons and building polygons are out of date when compared to the imagery and LiDAR.  A rule-based expert system was developed in eCognition to identify missed water polygons, find false buildings that were either incorrectly digitized or removed, and add in new buildings that are present in the LiDAR.


The eCognition project contains a single LiDAR point cloud dataset in LAS format.
Point cloud tile used in the project
eCogniton project settings

The rule set consists of three main processes:

  1. Reset - clears the image object levels and removes all loaded and generated data.
  2. Load data - adds the imagery and vector layers to the project.
  3. Feature extraction - classifies water polygons missing
The reset process simply clears the project so that the end user can start from scratch.  In the Load Data portion of the rule set I use the create/modify project algorithm to add in the imagery and vector layers.  You can do this when you create the project, but I like to keep my project setup simple.  Plus, having the rule set load the data make it easier for me to reuse the rule set.
Imagery is displayed after being loaded as part of the rule set

The fist step of the feature extraction process focuses on classifying water that was missing from the vector layer.  Water clearly stands out in the imagery, so all it took was some basic segmentation algorithms followed by a threshold classification using the NIR values.  Once the missing water features were classified they were converted to a vector layer within eCognition and smoothed to look more realistic using the new vector algorithms introduced in version 9.

Existing water in the vector layer (blue) and new water features classified from the imagery (yellow)
To aid with the building extraction I used some of eCognition's LiDAR processing tools to take the point cloud and turn it into a Normalized Digital Surface Model (nDSM) in which each pixel represents the height above ground.  
LiDAR nDSM
I also used the roads layer to create a new raster in which each pixel represents the distance to the roads.  This distance to roads layer is used later in the rule set to remove features that are similar to buildings, such as large overhanging signs.
Roads overlaid on the new distance to roads raster
By converting the existing building vectors to objects we could use information in the LiDAR point cloud, specifically the average elevation of all points minus the average elevation of ground points, to get false buildings.  These locations are exported as a point file, along with the unique building ID, for use in GIS software.
Buildings in the vector layer (red) and false buildings (green)
Getting at the missed buildings was a bit more complex.  First I did a simple threshold to separate out the tall features.  Within the tall features I re-segmented them based on the imagery the classified those objects with a high difference in elevation between the first and last return and low NDVI as new buildings.  Some refinement was done using the road distance layer along with area thresholding.  Finally a series of vector processing algorithms were used to clean up the appearance of the new buildings.






Friday, June 27, 2014

Statewide Tree Canopy Mapping

Statewide tree canopy mapping seems to be on the rise.  A few months ago we completed the first statewide tree canopy map for Maryland.  This was accomplished using high-resolution imagery and LiDAR.  The imagery came from the National Agricultural Imagery Program (NAIP) and the LiDAR datasets were cobbled together from various county collects.   LiDAR was crucial to our project as the imagery was highly variable, having been acquired over a number of weeks under differing conditions.  There were simply no good defining characteristics for tree canopy using imagery alone. The short of it is that without LiDAR mapping tree canopy accurately from imagery alone would have been virtually impossible.  (If you are interested in more details on our approach you can read our paper, which was published as part of the ASPRS 2014 conference in Louisville here.)

My Google Alert just turned up another statewide tree canopy dataset, this one for Indiana, produced by a company called EarthDefine.  Like our work EarthDefine using NAIP as the source for the imagery, along with LiDAR.  Also like us, they used an object-based approach.  Overall, I was very impressed with what EarthDefine was able to accomplish, but noticed some inconsistencies.  One of the challenges in mapping tree canopy from a combination of imagery and LiDAR is that LiDAR is often most acquired leaf-off.  NAIP, on the other hand, is acquired leaf-on.  In deciduous forested areas trees grow tall and thin compared to their urban counterparts, meaning that they don't generate many LiDAR returns, and thus don't produce a clear height signature.  To illustrate this issue let's first take a look at the NAIP data for one of the areas in Maryland we mapped.
Imagery (NAIP).

Although the area is urbanized we have some sections that could definitely be characterized as forest patches.  Now let's look at a LiDAR Normalized Digital Surface Model (nDSM).  An nDSM is a representation of the height of features relative to the ground.  In the example below white is tall and black is ground.
LiDAR nDSM
The areas highlighted with the yellow circle is clearly closed canopy in the imagery, but in the nDSM it appears to be only sparsely covered.  This is due to a combination of leaf-off LiDAR, tall deciduous trees with thin branches, and only 1-2 LiDAR points per square meter.  As a human you have no difficulty in resolving the differences between the two datasets, and you would clearly classify the area as tree canopy.  The reason being is that Humans are very adept at handling these disparities.  It gets much more complicated if you want to automate the the extraction of tree canopy.  In some cases it would be enough to simply say "if it looks like vegetation in the imagery and it is tall in the LiDAR then call it tree canopy."  This would work for some of the street trees, which due to their branching patterns, show up clearly in the LiDAR.  For the forested areas we had to build a much more sophisticated routine, one that mimicked the way the human brain worked, incorporating the spectral information from the imagery, the height information from the LiDAR, and the spatial relationships that exist.  In short we had separate routines for extracting individual urban trees and forested patches.  It was not easy, but as you can see from the graphic below it worked quite well, yielding results that were on par with manual interpretation.
Tree canopy

EarthDefine put together a wonderful web map service that allowed me to see how their product fared in similar instances.  I was very impressed with the urban tree detection in the EarthDefine product, but is appears their tree canopy layer (actually a canopy height model) underestimates tree canopy in deciduous forest patches.
Base Imagery (Google)
EarthDefine Tree Canopy Height Model
I have not had a chance to chat with EarthDefine about how they created this particular product, but on their web site they sate that they use Classification and Regression Trees (CART) in their approach.  CART is powerful, but it is a bit of a brute force approach in that it lacks intelligence and cannot replicate the iterative processes that humans employ when interpreting disparate datasets.  As a result you end up with situations, such as the one above, in which tree canopy is underestimated.

Overall, the EarthDefine tree canopy model for Indiana is impressive, one that should be valuable for a multitude of end users.  I would urge caution in using it to derive statistics, such as percent tree canopy, due to the fact that it underestimates tree canopy in forested patches.