In the Hierarchical Cluster platform, Near-Neighbor Joining Cycles are used in the first phase of the Hybrid Ward method. This is done to reduce the size of the table that is passed to the hierarchical clustering routine. The Near-Neighbor Joining Cycles algorithm has the following specifications:
Hybrid Goal
Specifies the maximum number of clusters allowed before the algorithm is stopped. The default value is 400.
Hybrid Cycles
Specifies the minimum number of near-neighbor joining cycles that are performed before the algorithm is stopped. The default value is 30.
Hybrid Initial K
Specifies the initial number of neighbors used in the near-neighbor joining cycles. The default value is 10.
The Near-Neighbor Joining Cycles algorithm repeats the following steps:
1. A vantage-point (VP) tree is created to efficiently look up nearest neighbors.
2. For each item, the k nearest neighbors to that item are determined.
3. The near neighbor pairs are sorted by distance.
4. For the half of those pairs with the smallest distances, join the items in each pair if the items have not already been joined with another item in this cycle. The joined item becomes an item for the next cycle.
5. Repeat step 1 through step 4 until the minimum number of cycles (Hybrid Cycles) is reached.
– If the number of items is less than or equal to the Hybrid Goal, stop.
– If the number of items is greater than the Hybrid Goal, continue repeating step 1 through step 4 until the number of items is less than or equal to the Hybrid Goal.
In each cycle, if the number of pairs that are joined is small, the number of nearest neighbors, k, is increased for the next cycle.The value of k can decrease in a later cycle if a sufficient number of pairs are joined in the previous cycle. The value of k increases and decreases according to the following rules.
• If less than 20% of the pairs in step 4 are joined, the value of k is increased by 10.
• If less than 10% of the pairs in step 4 are joined, the value of k is increased by 20.
• If less than 5% of the pairs in step 4 are joined, the value of k is increased by 30.
• If more than 30% of the pairs in step 4 are joined, the value of k is decreased by 10.