java - How to replace the special characters from input string in map reduce program -
i able replace special characters in normal java program.
this java code:
public class { public static void main(string[] args) { string s = "this785($^#')\""; system.out.println(s); s=s.replaceall("[^\\w\\s]", ""); system.out.println(s); }
but trying same in map reduce program not working
public static class map extends mapreducebase implements mapper<longwritable, text, text, intwritable> { @override public void map(longwritable key, text value, outputcollector<text, intwritable> output, reporter reporter) throws ioexception { string s = value.tostring().replaceall("\\w+\\s+",""); string[] words=s.split(" "); for(string a:words){ output.collect(new text(a),new intwritable(1)); } }
sample input map reduce program
"this@#$ is$# word$%^ (count)" "this@#$ is$# word$%^ (count)"
output of map reduce program
"this@#$ 2 (count)" 2 is$# 2 word$%^ 2
am doing wrong please me out!
you regex has changed [^\\w\\s]
\\w+\\s+
this regex means, match 1 or more alphabet (a-z/a-z) or number (alpha numberic) followed space or tab or new line etc. , replace empty string. in string have:
"this@#$ is$# word$%^ (count)"
you dont satisfy case , hence output.you either have $ or # or ^ followed space not alpha numeric character followed space.
Comments
Post a Comment