Java, calculating difference between unique characters in strings -
let's have 2 strings , need calculate difference between unique characters. it's simple:
string s1 = "abcd"; string s2 = "aaaacccbbf"; //answer: 1
the answer 1, because there no "f" in s1 variable.
but characters மா or 漢字, or other non ascii character? if loop though strings, 1 character கு count 2-3 times separate character, giving me wrong answer:
string s1 = "ab"; string s2 = "aaaகுb"; //answer: 2 (wrong!)
the code tried with:
class { public static void main(string[] args) { scanner sc = new scanner(system.in); string s1 = sc.nextline(); string s2 = sc.nextline(); sc.close(); string missingcharacters= ""; for(char c : s2.tochararray()) { if(!missingcharacters.contains(c+"") && !s1.contains(c+"")) missingcharacters+= c; } system.out.println(missingcharacters.length()); } }
your symbol கு
compound form of tamil script contains 2 unicode chars க் + உ
(0b95 + 0bc1). if plan work tamil script have find similiar characters pattern:
string s1 = "ab"; string s2 = "aaaகுb"; pattern pattern = pattern.compile("\\p{l}\\p{m}*"); matcher matcher = pattern.matcher(s2); set<string> missingcharacters=new treeset<>(); while (matcher.find()) { missingcharacters.add(matcher.group()); } matcher = pattern.matcher(s1); while (matcher.find()) { missingcharacters.remove(matcher.group()); } system.out.println(missingcharacters.size());
regex source: how match single unicode grapheme
Comments
Post a Comment