The Letter Matcher is really a general categorizer for skeletonized lines. You can define you own rule set for you symbols or train a neural network for it.
It comes with handwritten rules that will categorize the capital letters, but this is not very robust, and is made to match letter in sans serif typefaces, e.g. Ariel or Helvetica.
Unit tests exist for all capital letters.
Matched images can be found in this directory:
shapelogic/src/test/resources/images/smallThinLetters
For more robust matching you should train a neural network. If you train a neural network and you think that it might be useful for others you can upload it to: http://code.google.com/p/shapelogicapps/downloads/list
The Letter Matcher started as a test application for letter matching, the purpose of which is to show how to use the ShapeLogic framework on a simple well defined problem.
Currently 5 sets of rules have been defined:
This description can also be found on the Getting Started page.
Go to the shapelogic menu and select the macro CapitalLettersMatch, which does the following
If you already have a skeletonized lines you can also call Stream Vectorizer directly, which does the following
When you press CapitalLettersMatch or Stream Vectorizer an input dialog will prompt you for input:
DisplayInputDialog: If you uncheck this next time run CapitalLettersMatch this menu will not show up
DisplayResultTable: If this is checked you will see a result table with geometric features for each polygon
DisplayInternalInfo: Print out polygon, points and their annotations
UseNeuralNetwork: Use neural network for classification otherwise a rule set will be used
ShowFileDialog: If this is checked the user will be prompted with a file dialog to select categorizer setup file
CategorizerSetupFile: File path to the categorizer setup file containing neural networks, rules or column selection list
Dialog with results of match
Vectorized M with skeletonized letter in black and polygon in gray
Lines and points are annotated in order to match curves, circle arches or Bezier curves are not used. Lines and points can be filtered according to both position in the bounding box for a polygon, and point and line annotations.
The easiest way to define rule sets is to use a categorizer setup file. Here is an example that contains a definition of the letter A:
========== def A pointCount == 5 holeCount == 1 tJunctionLeftCount == 1 tJunctionRightCount == 1 endpointBottomPointCount == 2 horizontalLineCount == 1 verticalLineCount == 0 endPointCount == 2 softPointCount == 0
The meaning of the rules for A is:
The same rule set for defining A would take this form in in ShapeLogic 1.0. For a better description of logic language syntax.
rule("A", POINT_COUNT, "==", 5., letterFilter); rule("A", HOLE_COUNT, "==", 1., letterFilter); rule("A", T_JUNCTION_LEFT_POINT_COUNT,"==", 1., letterFilter); rule("A", T_JUNCTION_RIGHT_POINT_COUNT,"==", 1., letterFilter); rule("A", END_POINT_BOTTOM_POINT_COUNT, "==", 2., letterFilter); rule("A", HORIZONTAL_LINE_COUNT, "==", 1., letterFilter); rule("A", VERTICAL_LINE_COUNT, "==", 0., letterFilter); rule("A", END_POINT_COUNT, "==", 2., letterFilter); rule("A", SOFT_POINT_COUNT, "==", 0., letterFilter);
new NumericRule("C", HOLE_COUNT, polygon, VAR + HOLE_COUNT_EX, "==", 0.), new NumericRule("C", T_JUNCTION_POINT_COUNT, polygon, filter(T_JUNCTION_POINT_COUNT_EX), "==",0.), new NumericRule("C", END_POINT_BOTTOM_POINT_COUNT, polygon, filter(END_POINT_BOTTOM_POINT_COUNT_EX), "==", 1.), new NumericRule("C", MULTI_LINE_COUNT, polygon, size(MULTI_LINE_COUNT_EX), "==", 1.), new NumericRule("C", VERTICAL_LINE_COUNT, polygon, size(VERTICAL_LINE_COUNT_EX), "==", 0.), new NumericRule("C", END_POINT_COUNT, polygon, VAR + END_POINT_COUNT_EX, "==", 2.), new NumericRule("C", INFLECTION_POINT_COUNT, polygon, size(INFLECTION_POINT_COUNT_EX), "==", 0.), new NumericRule("C", CURVE_ARCH_COUNT, polygon, size(CURVE_ARCH_COUNT_ANN_EX), ">",1.), new NumericRule("C", HARD_POINT_COUNT, polygon, filter(HARD_POINT_COUNT_EX), "==", 0.), new NumericRule("C", SOFT_POINT_COUNT, polygon, size(SOFT_POINT_COUNT_ANN_EX), ">", 0.),
The meaning of the rules for C is:
/** Before a type is determined */ UNKNOWN, /** Not completely straight */ STRAIGHT, //only a start and an end point /** only on one side of a line between the start and end point */ CURVE_ARCH, /** On both sides of a line between the start and end point * this is the rest category for lines */ WAVE, /** Means that this is an arch and that it is concave compared to 1 or both * neighbor points, meaning they are on the same side of the line as the * arch */ CONCAVE_ARCH, /** This line contains an inflection point, meaning the end points have * different signed direction changes */ INFLECTION_POINT, /** For multi lines, if 2 lines are added together */ STRAIGHT_MULTI_LINE,
/** Before a type is determined */ UNKNOWN, /** If the point has a sharp angle for now 30 degrees is set as the limit */ HARD_CORNER, /** If it is not a hard point */ SOFT_POINT, /** 3 or more lines meets, U is for unknown */ U_JUNCTION, /** 3 lines meet 2 are collinear and the last is somewhat orthogonal */ T_JUNCTION, /** 3 lines meet less than 180 degrees between all of them */ ARROW_JUNCTION, /** 3 lines meet, not a T junction */ Y_JUNCTION, /** point is an end point, * maybe later there should be a distinction between end points and have * nothing else close by and end points that have close neighbors */ END_POINT,