Determine which of a set of images has been recompressed/resaved the least -
i'm working on system fuzzy image deduplication.
right now, have functional system can large-scale phash fuzzy image searching , deduplication via either dct-based or gradient-based perceptual hashes.
however, while determining if image has been reduced in size programatically trivial, how can determine image parent of which?
basically, if have 2 images same resolution, 1 resaved version of other (either in different format (jpg/png), or recompressed), how can determine 1 original in reliable manner?
(note: assume metadata has been stripped images, wish simple.)
bonus points if solution easy implement in python.
this isn't positive answer, spent while of time evaluating use of average entropy per-pixel determine if useful metric determining how compressed image is.
i have write here.
some excerpts:
variance in entropy across compression levels on sipi reference image database images.
in retrospect, x-axis should labeled "jpeg quality level". higher numbers mean better quality
while per-pixel entropy decline sharply @ extremely aggressive compression levels, not vary in way correlates compression level.
this means attempt compare 2 images inspecting entropy have issues unless 1 knows exactly compression level image had been resaved at.
Comments
Post a Comment