If the user wanted to customizing rules or neural networks for either the Color Particle Analyzer or for the letter matcher it demanded some programming until ShapeLogic 1.6. Now you can do most of this by loading a categorizer file.
Here is a directory with example files: http://code.google.com/p/shapelogic/source/browse/trunk/src/test/resources/data/neuralnetwork/
You can change the CapitalLetterMatch / Stream Vectorizer from a letter matcher to digit matcher by loading this file: shapelogic/src/test/resources/data/neuralnetwork/polygon_digit_recognizer_with_rules_print.txt:
Here are the definitions of the digit 0 and 1 and a list of the geometric features you to get written out:
========== COMMENT This categorizer setup file is doing the same as the old digit matcher, DigitStreamVectorizer_. ========== def 0 holeCount == 1 tJunctionCount == 0 endPointCount == 0 multiLineCount == 1 curveArchCount > 0 hardPointCount == 0 softPointCount > 0 ========== def 1 holeCount == 0 tJunctionLeftCount == 0 tJunctionRightCount == 0 endpointBottomPointCount == 1 horizontalLineCount == 0 verticalLineCount == 1 endPointCount == 2 multiLineCount == 0 softPointCount == 0 aspectRatio < 0.4 ========== PRINTS Category Points lineCount holeCount ==========
The biggest problem is to find the right names for the geometric feature.
Currently it only understand 3 relations >, < and ==.
There is a simple one to one mapping of rule definition above and below:
public void makeDigitStream() { rule("0", HOLE_COUNT, "==", 1.); rule("0", T_JUNCTION_POINT_COUNT, "==", 0.); rule("0", END_POINT_COUNT, "==", 0.); rule("0", MULTI_LINE_COUNT, "==", 1.); rule("0", CURVE_ARCH_COUNT, ">", 0.); rule("0", HARD_CORNER_COUNT, "==", 0.); rule("0", SOFT_POINT_COUNT, ">", 0.); rule("1", HOLE_COUNT, "==", 0.); rule("1", T_JUNCTION_LEFT_POINT_COUNT, "==", 0.); rule("1", T_JUNCTION_RIGHT_POINT_COUNT, "==", 0.); rule("1", END_POINT_BOTTOM_POINT_COUNT, "==", 1.); rule("1", HORIZONTAL_LINE_COUNT, "==", 0.); rule("1", VERTICAL_LINE_COUNT, "==", 1.); rule("1", END_POINT_COUNT, "==", 2.); rule("1", MULTI_LINE_COUNT, "==", 0.); rule("1", SOFT_POINT_COUNT, "==", 0.); rule("1", ASPECT_RATIO, "<", 0.4);
BLOCK_START = "==========" ; comment-section = "COMMENT" , {string} , BLOCK_START ; feature-section = "FEATURES" , {string} , BLOCK_START ; print-section = "PRINTS" , {string} , BLOCK_START ; result-section = "RESULTS" , {string} , BLOCK_START ; weights-definition = "WEIGHTS" , {number} , BLOCK_START relation = "<" | ">" | "==" ; rule-definition = "def" , string , { string , relation , number } , BLOCK_START ; categorizer-definition = BLOCK_START , {comment-section | feature-section | print-section | result-section | weights-definition | rule-definition } ;
It is hard to cause syntax errors in this grammar. But a sequence of characters starting with more === will not be considered a string.
Here is a short description of what the different sections of the categorizer setup files is doing:
This is just a comment.
This is the the feature data streams that are needed as input for the neural network.
They can take 2 different forms:
The streams often have camel case names e.g. aspectRatio starting with lower case.
The headline names are often short names and are often starting with upper case.
This was chosen on purpose to be able to distinguish them.
A neural network produces a list of number from the last layer.
If there is a match only one of these numbers should be over 0.5, and that is considered the result of the neural network.
The result section assigns names to each of the outputs of this final list. E.g for a digit match there will be 10 output number and they will have names 0, 1, 2, 4, 5, 6, 7, 8 and 9.
Here is an example of a neural network that explains how the weights are organized:
/** Weights found using the Joone Neural Networks. */ public final static double[][] WEIGHTS_FOR_XOR = { { //Output 0 , 1 , 2 2.7388686085992333, 5.505721328606976, 4.235258932026585, //bias //Input -6.598582463774703, -3.678198637390036, -2.9604962169635076, // 0 -6.59030690954159, -3.7790406961228347, -2.845930422442215 // 1 }, { //Output 0 -5.27100082610628, //Bias for hidden second layer //Input -10.45330943056037, // 0 6.582922049952558, // 1 4.7139611039662945 // 2 } };
Here is an example of the rules defining the digit 0:
========== def 0 holeCount == 1 tJunctionCount == 0 endPointCount == 0 multiLineCount == 1 curveArchCount > 0 hardPointCount == 0 softPointCount > 0
holeCount is the name of the data feature stream of hole counts. So the first line will filter away all polygons that do not have a hole count equal to 0.
A stream-dictionary of what the different streams do and is called is being worked on now.
Finding names of streams takes a little digging. Here are a few places to start digging:
Look at the files where rules are defined. A good idea is to download the sources and open it using a modern IDE like Eclipse, IntelliJ, JBuilder, NetBeans. Then it is easier to maneuver to see where things are defined.
Not that different streams are defined for the particle analyzer group and letter matcher group.
src/main/java/org/shapelogic/imageprocessing/StreamVectorizerIJ.java
src/main/java/org/shapelogic/imageprocessing/StreamVectorizer.java
src/main/java/org/shapelogic/imageprocessing/ColorParticleAnalyzerIJ.java
src/main/java/org/shapelogic/imageprocessing/ColorParticleAnalyzer.java
src/main/java/org/shapelogic/logic/CommonLogicExpressions.java
In this directory src/main/java/org/shapelogic/streamlogic there is several interesting file: