Thursday, May 25, 2017

GIS Programming: Module 1

Module 1: Introduction to Python

The image above is a screenshot for a script that was run in creating the file folders seen. This was our first script in action using PythonWin. The folders seen above will be used for further file paths in upcoming assignments.

Our process summary is as follows;


1.       Copied the CreateModuleFolders.py script from the R drive and pasted into the R drive.
2.       Right clicked the CreateModuleFolders.py script and choose Edit Pythonwin.
3.       Clicked Run within PythonWin to run the script which created a folder name GIS Programing and contains 12 modules.

Thursday, May 4, 2017

Cartographic Skills: Final Project

For my final project I created a population growth map for the Dallas/Fort Worth Metro Area.
Dallas/Fort Worth Metro Area is the fourth largest metropolitan area in the United States and is the largest inland metro area. With a population of 7.2 million people and a population growth of 12.5%, where will all of these people live? The map I’ve created using current population estimates and future growth estimates, should give a clear picture to where people are moving within the Dallas/Fort Worth Metro.



Two thematic methods are used for this map The first method uses graduated symbols to display 2017 population growth data.  The data behind the population growth is derived from the North Central Texas Council of Government which provides a 2017 Annual Population Estimate, from the spring of 2016 to the spring of 2017. Each city’s population is calculated separately for different housing types and an estimation for the population of each type of hosing unit is calculated. The variation of household size and occupancy rates are taken into account for an estimation of population of each housing unit type.
Using graduated symbols to display the 2017 Annual Population Estimate data allows you to see the amount of growth as the size of the symbol is proportional to the value of growth. Therefore, the largest symbols represent the highest growth and smaller symbols represent little change in growth. The population growth data was in excel forma and was joined with a city boundary polygon layer. Using the city names to relate the table to the city boundary layer, shows a geo location at the centroid of each city for placement of the graduated symbols.
The 2017 Annual Population Estimate data was standardized by calculating the percent of increase from 2017 to 2016 for each city’s population estimate. By taking the 2017 population value and subtracting the 2016 value, gives you the amount of increase.  Once the increase is calculated, a percent is equated by taking the population increase, divided by the past 2016 value and the multiplied by 100 to give a percent of increase. Standardizing the data by using percent of increase allows the values to noticed for cities that have high growth because it does not take into account the actual value or size of each city. As an example, if you did not standardize the data and used only the 2017 population totals, you will not obtain good results.  The values for higher population cities will display larger values and not give you any idea to the amount of change in population. Therefore, the data must be standardized using a percent increase to properly see the graduated symbols appropriately.
The data classification used to show the 2017 Annual Population Estimate data uses the Natural Breaks Classification and has 5 class breaks. Most of the data falls near the beginning of the number line and data tapers off in the middle with a few outliers far down the number line. These outliers are necessary as they truly represent the highest values and are necessary to see the high growth. Therefore, using Natural Breaks Classification allows the class breaks to naturally divide at the points were data goes flat along the number line but breaks out the high outliers appropriately. Some further standardizing of the data was necessary in order to remove some outliers for very small cities. These small cities had population values below 5000 and the growth in some cities skewed the data for the large cities. The small cities had extremely high percent of growth and does not compare well against the large city population. So cities with a population below 4999 were removed.

The second method of data classification used is a choropleth map, forecasting population growth estimates in the year 2040. A data set from North Central Texas Council of Government was used to display the population forecast. The dataset, 2040 Demographic Forecast uses methods of long-range forecasting to give a view into where and how much growth will occur the metro area. The forecasting uses a series of methods and formulas to calculate projections and accounts for many variables of error. The data displayed in the choropleth is broken up into districts and a percent value is used to show values of low to high growth. Using a choropleth map for the data gives a good view of where the population will grow but also it allows for the graduated symbols to be placed over the layer in order to understand where and how much growth is occurring.
The 2040 Demographic Forecast data was standardized using population statistics from 2005 which was already present in the dataset. A percent of increase was calculated based off the 2005 population statistics against the projected 2040 population statistics. Percent of increase was again calculated by dividing the increase against the original 2005 data and multiplied by 100. This dataset came in a shape file format and the database file was exported into excel to obtain the percent of increase. It was then exported as an xls file and imported back to ArcMap, where a join was made back into the original 2040 Demographic Forecast data. Again a percent of increase will show population growth better than using raw total population counts. Therefore, it is best to standardize this data.
Natural Breaks data classification was used to display 5 class breaks for percent of increase of the 2040 Demographic Forecast data. Most of the data clumped at the beginning of the number line, with major outliers falling at the middle and end of the number line. These outliers are the most important breaks as they are the highest growing districts. So using Natural Breaks data classification appropriately finds the minor breaks, clumped at the beginning of the number, creating two breaks below the mean value. Natural Breaks found three breaks above the mean value which breaks out the data for the highest and more important values of data.
Other methods of data classification were experimented with for each population layer. One such method is the Quintile data classification. This method comes close to Natural Breaks with similar break out of classes but clumps four breaks too early at the beginning of the number line. The four breaks fall under the mean value, leaving one class to show rest of the data that falls above the mean value. The classification does not show the data well as the highest values fall flat into one class break.