This is a Density-Based Spatial Clustering of Applications with Noise. Using a dataset of taxi trip records in New York City, in 2009, I identified the best waiting areas for vehicles. Since the taxi's are located in relatively small areas, I can simply use the Euclidean distance between the GPS coordinates.
GPSCoord which held the GPS coordinate of the taxi's starting location.
TripRecord holds information read from the CSV file: the date,the pickup location,the drop off location and the trip distance. I used the TripRecord as a sort of pointin my DBScan algorithm so we can consider TripRecords as part of clusters or Noise.
Cluster class holds the average GPS coordinate as well as the list of TripRecord held in a Cluster. These Clusters are used to calculate and held values that will be listed in a csv file.
Finally TaxiCluster holds all the the DBSCAN algorithm itself as well as a multitude of helper function to read the csv file,calculate distances and create csv files
This implmentation uses structs instead of classes and follows the same principles as the java but utilizes concurrency
I use the producer consumer pattern. The producer that produces jobs for each partitions and the consumer that consumes each one.
------------------- RUNING THE CODE -------------------
To compile and execute: go run map.go
EPISLON : 0.0003 MINPOINTS : 5
To change number of consumer threads you can change varaible "NumberOfConsumerThreads" on line 80
------------------- EXPERIMENTATION -------------------
N=2 and 4 consumer threads = Execution time: 40.465053292 of 198194 points
N=4 and 4 consumer threads = Execution time: 9.904227896 of 206102 points
N=4 and 10 consumer threads = Execution time: 10.876244625 of 206102 points
N=10 and 4 consumer threads = Execution time: 2.769759061 of 222488 points
N=10 and 10 consumer threads = Execution time: 2.397145154s of 222488 points
N=10 and 50 consumer threads = Execution time: 2.857153542s of 222488 points
N=20 and 10 consumer threads = Execution time: 2.232980004s of 255300 points
N=20 and 50 consumer threads = Execution time: 2.274099476s of 255300 points
N=20 and 200 consumer threads = Execution time: 2.582893006s of 255300 points
2 GHz Quad-Core Intel Core i5
Total : 8 Cores
These implementations are heavily based off of https://en.wikipedia.org/wiki/DBSCAN PseudoCode.