Initially, in the plot_decision_boundary code, I used hardcode color schema to represent each class. If the input class # is higher than that, an error will be presented. This may not be the best approach. Then I start to think about: how to define class RGB representation in this specific use case? Basically, here is a few fact/requirement:
Fact: F1. any data point has a probability associated with different class, they sum to 1. Requirement: R1. If one point is 100% in class A, it should have a unique color, and cannot be constructed as a hybrid composition by other classes. R2. Each color represents a unique combination of different classes composition. For R2, after some thinking, I found it is impossible to get such noduplication satisfied. This is because as long as the class # is larger than 4 (3 + 1), due to the dimensionality difference, there must be a duplication happens. So ignore this requirement. For condition R1, it could be satisfied as long as the RGB value for all classes form a strict convex object within RGB 3D space. A simple example is: on a RG 2D space, if we use (0,0), (1,0), (0,1), (1,1) as class color definition, no class can be represented by a composition of other classes (based on coefficient sum to 1). However, if we add a new class like (0.5, 1.0), it can be written as: (0.5, 1.0) = 50% * (1, 1) + 50% * (0, 1) This is because the extra point is not making the class color geometry strict convex any more. A easier solution is to define a sphere within RGB 3D box, and taking the color representation over the sphere as each class color. In this case, R1. should be satisfied. However, in practice, I found if using just the sphere, the contrast between classes will be smaller than ignore R1. Here is the comparison between 20 colors and 100 colors between a sphere surface representation vs. RGB 3D surface representation. Currently, I set it as the default in plot_decision_boundary function.
1 Comment

AuthorData Magician Archives
October 2017
Categories
All
