Changes

Jump to navigation Jump to search
An arbitrary upper token set length limit of 5 is used if the length of the source token array (4 in the example above) is greater than or equal to 5. Then beginning at the upper length limit and decreasing by one after each set of this lenght has been tried, and starting from the right hand-side and moving one unit to the left each time, the token sets are joined with spaces and exact matched against the reference string. This process iterates all length one token sets have been tried and records the matches in the order that they were made. Thus continuing the example above the space-joined source token sets would be, in the order that they are tried:
#String1 String2 String3 String4 (token set lenghtlength=4, first and only set)#String2 String3 String4 (token set lenghtlength=3, first set)#String1 String2 String3 (token set lenghtlength=3, second set)#String3 String4 (token set lenghtlength=2, first set)#String2 String3 (token set lenghtlength=2, second set)#String1 String2 (token set lenghtlength=2, third set)#String4 (token set lenghtlength=1, first set)#String3 (token set lenghtlength=1, second set)#String2 (token set lenghtlength=1, third set)#String1 (token set lenghtlength=1, fourth set)
===NGram and LCS Matching===
Anonymous user

Navigation menu