Efter lite testande så kom jag fram till två lämpliga funtioner och lite testdata:
Kod:
CREATE FUNCTION [dbo].[split](
@delimited NVARCHAR(MAX),
@delimiter NVARCHAR(100)
) RETURNS @t TABLE (id INT IDENTITY(1,1), val NVARCHAR(MAX))
AS
BEGIN
DECLARE @xml XML
SET @xml = N'<t>' + REPLACE(@delimited,@delimiter,'</t><t>') + '</t>'
INSERT INTO @t(val)
SELECT r.value('.','varchar(MAX)') as item
FROM @xml.nodes('/t') as records(r)
RETURN
END
Kod:
CREATE FUNCTION [dbo].[similarity]
(
@text1 NVARCHAR(MAX),
@text2 NVARCHAR(MAX)
) RETURNS real
AS
BEGIN
DECLARE @similarity real
DECLARE @similar real
DECLARE @total real
select @total=COUNT(*) from Split(@text2,' ')
select @similar=COUNT(*) from Split(@text1,' ')
WHERE val In (select val from Split(@text2,' '))
set @similarity = @similar / @total
RETURN @similarity
END
Lite testkörning:
Kod:
Declare @text1 nVarchar(Max) = 'Hello John Smith the first'
Declare @text2 nVarchar(Max) = 'Hello John Smith the second';
Declare @text3 nVarchar(Max) = 'Hello John Smith the second kalle olle ';
select dbo.similarity(@text1,@text2) -- 0,8 -> 80%
select dbo.similarity(@text1,@text3) -- 0,5 -> 50%